A
Anders Borum
Hello!
I'm implementing support for disk based caching of binary resources (blobs)
residing in a SQL database. This post is about choosing the right strategy.
Because of the web environment, there are potentially many concurrent
requests to a resource. I would like to keep the application responsive
(continue to serve requests for resources) while streaming resources to
disk.
The case scenario is a number of concurrent requests (e.g. 10 or 20) asking
for a resource (e.g. with a size of e.g. 32 MB) while the resource has not
yet been cached on disk. In order to serve each request (in keeping the
application responsive), an approach is to start streaming the resource from
DB to the client request - and simultaneously queue a task to the threadpool
that streams the resource to disk.
I'm thinking about implementing a producer / consumer pattern here; a
producer creates tasks that the consumer picks up and starts streaming to
disk.
Another approach would be to receive the request and check for the file
existense (using a cache for quick lookups). If not cached, then check
whether a "streaming register" contains information about the current file
currently being streamed to disk. If not in the "streaming register", then
queue a task to the thread pool (each worker thread is resposible for
registering / unregistering the current streaming process).
Locking semantics should ensure that only a single thread is able to stream
the same file to disk (e.g. it makes no sense that two threads are trying to
stream the same file to disk in parallel).
I've already got all the major parts in place;
1. Authentication of requests to resources.
2. Managing http headers (result codes, mime types etc.).
3. Streaming large resources to disk from the DB (files are stored with a
unique identifier (guid) plus a .cache extension).
4. Streaming large resources to a client response from the DB (using the
chunk pattern).
5. Transmitting disk based resources to a client response (using IIS
infrastructure for high performance).
6. Scavenging of disk based resources not requested for a certain threshold.
I guess what I'm asking for are guidelines (or "do's" and "don'ts"). Am I
working in the right direction?
I'm implementing support for disk based caching of binary resources (blobs)
residing in a SQL database. This post is about choosing the right strategy.
Because of the web environment, there are potentially many concurrent
requests to a resource. I would like to keep the application responsive
(continue to serve requests for resources) while streaming resources to
disk.
The case scenario is a number of concurrent requests (e.g. 10 or 20) asking
for a resource (e.g. with a size of e.g. 32 MB) while the resource has not
yet been cached on disk. In order to serve each request (in keeping the
application responsive), an approach is to start streaming the resource from
DB to the client request - and simultaneously queue a task to the threadpool
that streams the resource to disk.
I'm thinking about implementing a producer / consumer pattern here; a
producer creates tasks that the consumer picks up and starts streaming to
disk.
Another approach would be to receive the request and check for the file
existense (using a cache for quick lookups). If not cached, then check
whether a "streaming register" contains information about the current file
currently being streamed to disk. If not in the "streaming register", then
queue a task to the thread pool (each worker thread is resposible for
registering / unregistering the current streaming process).
Locking semantics should ensure that only a single thread is able to stream
the same file to disk (e.g. it makes no sense that two threads are trying to
stream the same file to disk in parallel).
I've already got all the major parts in place;
1. Authentication of requests to resources.
2. Managing http headers (result codes, mime types etc.).
3. Streaming large resources to disk from the DB (files are stored with a
unique identifier (guid) plus a .cache extension).
4. Streaming large resources to a client response from the DB (using the
chunk pattern).
5. Transmitting disk based resources to a client response (using IIS
infrastructure for high performance).
6. Scavenging of disk based resources not requested for a certain threshold.
I guess what I'm asking for are guidelines (or "do's" and "don'ts"). Am I
working in the right direction?