File processor

  • Thread starter Thread starter Peter Morris
  • Start date Start date
P

Peter Morris

Hi all

This is a bit vague I suppose :-) Tomorrow I need to write a service which
monitors two folders for new files and performs tasks appropriately. Some
of these tasks are not too intensive and some are. Here's a scenario

Event: \Incoming\SomeFile.txt
Action: Copy to a backup folder. Move it elsewhere

Event: \Incoming\SomeFile.zip
Action: Copy to a backup folder. Unzip a file within it elsewhere

Event: \Outgoing\SomeFile.txt
Action: Copy to a backup folder. Move it elsewhere

Event: \Outgoing\SomeFile.xml
Action: Parse the XML, generate a binary file, zip the binary file, backup
the zip file, copy the zip elsewhere.


In most of these cases the task is quick, in the final case the task could
take up to a couple of minutes. I really need to look into this in great
detail in the morning, but I am hoping to get a bit of a head-start :-)


01: Is there a class for monitoring new files in a folder and triggering an
event or something with the name of the new file?

02: I expect that once the event triggers I will stuff the filename into a
thread-safe queue. If I have a thread pool for the quick tasks and queue
tasks to perform I presume the thread automatically sleeps again once the
task is complete, is that right?


Thanks

Pete
 
FileSystemWatcher

Shame there wasn't a way of receiving notifications after the file is
created and the file handle closed. I had to write something to handle this
situation.

Which thread? The thread pool thread? Yes, if there are no more thread
pool tasks queued, a thread pool thread will simply enter a wait state
until a new task is queued.

I decided against the pool thread. I have tasks which are immediate, short,
long in duration. I didn't want them all in the same thread pool because a
few long tasks would hog it. I'm going to have 3 threads, each with their
own queue, and give them jobs to do. Adding to the queue will resume a
thread, running out of jobs will suspend it.


Thanks for the info!


Pete
 
I'm sorry to bother you, but I'm a little confused about this statement and
I was hoping you could clarify it for me.

Peter Duniho said:
Note that if your tasks are not i/o bound, and you expect there to be a
large number of them queued in a short time, using the built-in thread
pool is probably not a great idea, as your tasks will all wind up fighting
each other for the CPU, wasting lots of time in the process.

Are you saying, that queuing a lot of CPU bound tasks on the thread pool is
a bad idea?

That's not generally my understanding. Unless the tasks are long running,
the thread pool is well suited for this kind of task, and it is designed to
provide good performance based on the number of available CPUs. Fact of the
matter is, that if you have many CPU bound tasks, they will compete for CPU
time no matter what kind of threading strategy you use.

--
Regards,
Brian Rasmussen [C# MVP]
http://kodehoved.dk



Peter Duniho said:
[...]
01: Is there a class for monitoring new files in a folder and triggering
an event or something with the name of the new file?
FileSystemWatcher

02: I expect that once the event triggers I will stuff the filename into
a thread-safe queue. If I have a thread pool for the quick tasks and
queue tasks to perform I presume the thread automatically sleeps again
once the task is complete, is that right?

Which thread? The thread pool thread? Yes, if there are no more thread
pool tasks queued, a thread pool thread will simply enter a wait state
until a new task is queued.

Note that if your tasks are not i/o bound, and you expect there to be a
large number of them queued in a short time, using the built-in thread
pool is probably not a great idea, as your tasks will all wind up fighting
each other for the CPU, wasting lots of time in the process.

Pete
 
Shame there wasn't a way of receiving notifications after the file is
created and the file handle closed. I had to write something to handle this
situation.



I decided against the pool thread. I have tasks which are immediate, short,
long in duration. I didn't want them all in the same thread pool because a
few long tasks would hog it. I'm going to have 3 threads, each with their
own queue, and give them jobs to do. Adding to the queue will resume a
thread, running out of jobs will suspend it.
You might consider giving each task its own thread pool at the
outgoing end of each task queue. I did something kinda similar a few
years back where a worker thread would read its request queue, perform
the task, and finally write to a parallel response queue.

regards
A.G.
 
Thanks for the reply - please see my comments below
No, not really. The thread pool doesn't do anything in particular to
match active threads with the CPU count. If the tasks are so short-lived,
and queued so infrequently that one just naturally has relatively few
threads competing with each other for the CPU, then that's fine. There's
probably no need to go to the extra effort to limit the number of active
threads at once.

According to the documentation
(http://msdn.microsoft.com/en-us/library/system.threading.threadpool.aspx)
that's not entirely correct. The documentation says, "The thread pool
maintains a minimum number of idle threads. For worker threads, the default
value of this minimum is the number of processors." In other words: The
thread pool tries to avoid creating redundant threads based on the number of
CPUs. As you point out having more threads than CPUs is wasteful.
Define "compete". The fact is, there's a good way to compete and a bad
way.

By competing I mean, that the scheduler will switch between all runnable
threads with the highest priority. As the switching is expensive it should
be minimized.

Anyway, I'm aware of all the stuff you go through about CPU threads vs. I/O
threads and as far as I can tell, we have the same understanding of those
issues. Given that, I'm confused that you end your post with the following:
There's not a single one right way to do threading. It does depend on
your specific task. But for CPU-bound tasks, it is _definitely_
counter-productive to simply queue a large number of tasks and let the
thread pool sort it out. You can get much more efficient throughput by
making sure you never have more runnable threads than you have CPUs.

I agree that threading is hard and I certainly won't claim to be a master in
the field. However, I cannot see why you would gain an advantage by doing
what you describe here.

Assume we have 10 CPU bound tasks (non-blocking and short running) and 2
available CPUs. In this case the thread pool will schedule the tasks to run
on 2 CPUs and thus not create additional threads thereby reducing the cost
of switching between threads. On the other hand if you create 10 threads and
let each of them run one of the tasks each, you not only pay the price of
creating additional threads, you will also end up with a lot of context
switches which is pure overhead (assuming of course that the tasks cannot be
completed within a single time slice).

If the goal is to complete all tasks as fast as possible, it seems to me
that the thread pool offers a pretty good deal.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top