FileSystemWatcher question

N

Nick

Hello,

I've got a FileSystemWatcher object setup to check for new files in a
folder. I've never done much with threading, but I think I mght need to here.
When I find a new file there are times when I need to do some time consuming
tasks. When this happens, as it is now, the FileSystemWatcher doesn't seem to
see all of the files if I move a large batch of files in the watch folder at
once. Could someone give me a push in the right direction? Thanks for any
help, I really appreciate it.

Thanks,
Nick
 
P

Peter Morris

public class FileProcessor
{
private object SyncRoot = new object();
private Queue<string> Queue = new Queue<string>;
private bool Terminated = false;

public FileProcessor()
{
var threadStart = new ThreadStart(Execute);
new Thread(threadStart).Start();
}

public void QueueFile(string fileName)
{
lock (SyncRoot)
{
Queue.Enqueue(fileName);
Monitor.Pulse(SyncRoot);
}
}

private void Execute()
{
while (!Terminated)
{
string fileName = null;
lock(SyncRoot)
{
if (Queue.Count > 0 && !Terminated)
fileName = Queue.Dequeue();
}

if (fileName != null)
ProcessFile(fileName);
else
Monitor.WaitFor(SyncRoot);
}
}

public void Terminate()
{
Terminated = true;
lock (SyncRoot)
{
Monitor.Pulse(SyncRoot);
}
}



}


Something like that anyway. Probably wont even compile, but you never know
your luck :)
 
P

Peter Morris

Hi Pete

Any chance you could explain a few things?
private object SyncRoot = new object();
private Queue<string> queue = new Queue<string>;

public FileProcessor()
{
new Thread(Execute).Start();
}

Yeah, I was tempted to do the same thread start but couldn't remember if I
needed ThreadStart or not, and I wasn't willing to start up VS :_)

public void QueueFile(string fileName)
{
if (fileName == null)
{
throw new ArgumentNullException("fileName");
}

hehe. You work too hard :) For newsgroups I just can't be bothered with
stuff like this, it's bad enough that I have to write it in my real code let
alone newsgroups. I suppose I should try to be a perfectionist in NG source
as well as my own, but at the end of the day I think the person reading has
the responsibility to learn some good techniques of their own rather than
responders being responsible for teaching them.
private void Execute()
{
lock (SyncRoot)
{
while (true)

This is where I am starting to disagree with you :)

{
if (queue.Count > 0)
{
string fileName = queue.Dequeue();

if (fileName == null)
{
break;

That's it, we are now officially in disagreement :) Why would you go for
such a vague way of terminating the thread rather than my crystal clear
"Terminated" member?

Other than the change to how I terminate the thread + the check for null I
don't see how yours is functionally different from mine, what were your
thoughts at the time of reading my source?
// Subscribe to the FileFound event to be able to process filenames
// registered with this class
public event Action<string> FileFound;

private void ProcessFile(string fileName)
{
Action<string> handler = FileFound;

if (handler != null)
{
handler(fileName);
}
}
}

I think I see now! You are showing him how to handle ProcessFile! There
was just no way I was going to bother :)

To the OP: I think the point of the above is that Pete M. is suggesting
that you don't do lengthy processing in your event handler(s) for the
FileSystemWatcher event(s), but rather just queue filenames to a separate
consumer class and let it handle things as it can. The above would allow
that.

Exactly. In my service that does something similar I have a class that
decides what to do with the file based on its location + extension. It goes
into one of three of these queues based on how long the operation takes

Fast - Typically moves the file elsewhere
Intermediate - Typically unzips a small file
Long - Generates a MySQL db and zips it

So that the long process doesn't hold up the stuff that should occur
immediately despite the fact that there is a long term operation in
progress.
FileSystemWatcher class to notice them all (it relies on a fixed-sized
buffer than if filled before additions can be processed will cause some
notifications to be lost).

Any idea what size this buffer is? For me there will be a maximum of 10
people connecting at once so I doubt I will need to be concerned, but it's
always interesting to know.
 
P

Peter Morris

Hi Pete

Why do you say it is vague? There's no ambiguity at all here.

In your case I see a null entry in the list as a kind of "magic number"
approach. If the value is null you end, or maybe you would put in the
string "STOP" or "FINISH" or "QUIT". My point is that stopping based on the
value of an entry is vague because what that value should be is entirely up
to the person writing the code, different people would have different
opinions on what that value should be. However, having a Terminated boolean
member to indicated that the Terminate has been called is obvious, the value
will be "true" if terminated, it is indisputible.

I've removed a variable for one. With the removal of that variable, not
only have I saved as many as 4 bytes from the size of my class, I have
avoided the risk of making the mistake you did: failing to either put the
use of that variable inside a synchronized block of code, or making the
variable "volatile". :)

Yes, I did miss that, didn't I :) I'd still rather go with the boolean
approach though :)
Assuming one fixes your bug, there is no _functional_ difference between
the two.

Except when you terminate :)

My approach will stop processing items in the loop immediately, whereas your
approach will finish processing the items before exiting. Your
implementation is more of a "FinishFirstThenTerminate", whereas mine is a
"TerminateImmediately" one. Using my approach I can easily add an
additional condition

if (Terminated && Queue.Count == 0)
break;

if I were to desire the same behaviour as you, whereas using the magic value
approach you cannot easily implement a TerminateImmediately approach without
looking through the list each time.

But, I find my way simpler and thus more maintainable.

Unless you want to terminate immediately. I must confess I didn't spot this
difference immediately, it was only as I went into a little daydream :)
lol...well, in that case you just guarantee that your code example
wouldn't compile. :)

To qualify for compiling code from me one of the following criteria needs to
be met.
01: I have already written it and can copy/paste
02: I get paid :)

I'm more of a "here's the general idea, now write it youself" kind of
helper. The best lesson you can learn is how to teach yourself :)

No, I don't know. Last time I looked at this, I did take a look to see if
it was documented, but couldn't find any specifics.

I should be okay with around 10 at the time time, maximum, I hope! This
should definitely be documented, and obvious!


Thanks for the discussion, I enjoy getting technical :)
 
N

Nick

Thanks again to both of you for the excellent discussion.

Peter Duniho said:
Hi Pete



In your case I see a null entry in the list as a kind of "magic number"
approach. If the value is null you end, or maybe you would put in the
string "STOP" or "FINISH" or "QUIT". My point is that stopping based on
the value of an entry is vague because what that value should be is
entirely up to the person writing the code, different people would have
different opinions on what that value should be. However, having a
Terminated boolean member to indicated that the Terminate has been
called is obvious, the value will be "true" if terminated, it is
indisputible.



Yes, I did miss that, didn't I :) I'd still rather go with the boolean
approach though :)


Except when you terminate :)

Ah, I see what you mean. Yes, you're right...there's a slight
difference. Though, I'll point out that one reason it's hard to detect
(you had trouble yourself :) ) is that you've got your "dequeue" mixed up
with your "terminate". :)

That behavior isn't precluded by using a sentinel value (and for the
record, sentinel values are quite common in software...there's nothing
fundamentally wrong with them), but you'd have to use an implementation of
Queue<T> that allows for inserting at the head (i.e. not .NET's
implementation). You could write your own, but I agree that doing so
_just_ to avoid the flag wouldn't make sense.

Also note that the code you posted doesn't actually behave exactly as you
say. In particular, this statement isn't true:
My approach will stop processing items in the loop immediately,

The code you posted doesn't terminate when you call Terminate(). It
simply goes back and waits again. It won't terminate until the _next_
time the monitor is pulsed (whether that happens due to another call to
Terminate() or someone trying to enqueue something).

Again, I think this illustrates how a simpler design is easier to code
correctly. By my count, we're up to three bugs in your implementation
just related to termination alone. I got my implementation right the
first try, and I don't think you can say it's because I'm a better
programmer. I'm not...I just know better than to complicate things when I
don't have to. :)

Now, if you have to -- that is, you have that specific requirement to
terminate immediately -- I suppose you have to implement it that way. But
neither of us have the actual specification to work with, and it's
entirely possible my version would work fine in this context.
[...]
I'm more of a "here's the general idea, now write it youself" kind of
helper. The best lesson you can learn is how to teach yourself :)

Well, I agree with that generally. In fact, you'll note that I rarely
actually post actual code. I prefer to just describe a solution, or point
someone to the documentation. But, I'm of the mind that when I do post
code, I should try to get it as close to correct as possible. Just
because I'm not being paid, that doesn't mean I shouldn't take pride in my
work. :)
I should be okay with around 10 at the time time, maximum, I hope! This
should definitely be documented, and obvious!

As long as in your case, 10 connections translates directly to no more
than 10 file system changes at once, then I think you're definitely safe.
That's an odd guarantee to be able to have, but if you have it, no
problem. :)

Pete
 
P

Peter Morris

Hi Pete
Also note that the code you posted doesn't actually behave exactly as you
say. In particular, this statement isn't true:


The code you posted doesn't terminate when you call Terminate(). It
simply goes back and waits again.

I am struggling to see how this is the case, a second ago I thought I had it
but then it went from me.


The closes I can get is thinking multi-threads. What happens if Terminate
is called at a specific point of execution by another thread.

while (!Terminated)
{
string fileName = null;
lock(SyncRoot)
{
if (Queue.Count > 0 && !Terminated)
fileName = Queue.Dequeue();
}

**HERE**

if (fileName != null)
ProcessFile(fileName);
else
Monitor.WaitFor(SyncRoot);
}

The thing is, if it is **HERE** before Terminate is called then it is there
because there is something to process, so WaitFor wont get called and after
processing the WHILE will evaluate to false and the loop will exit.

Can you explain the steps to me that would reproduce the behaviour you
describe? I've already benefitted a lot from our previous discussion on
this :)

Thanks

Again, I think this illustrates how a simpler design is easier to code
correctly. By my count, we're up to three bugs in your implementation
just related to termination alone. I got my implementation right the
first try, and I don't think you can say it's because I'm a better
programmer. I'm not...I just know better than to complicate things when I
don't have to. :)

I still think the loop above is flawless, I need convincing. At the moment
I would still definitely go for the approach I put in originally, possibly
line for line (except I would add the ProcessFile code obviously). I don't
know what the 3 errors were, you aren't counting the lack of ProcessFile are
you? That would be cheating, that was a deliberate omission and not a
logical flaw.

Now, if you have to -- that is, you have that specific requirement to
terminate immediately -- I suppose you have to implement it that way. But
neither of us have the actual specification to work with, and it's
entirely possible my version would work fine in this context.

That is true, but my point was that using the explicit Terminated boolean
not only is it clear how to identify the fact that it has terminated but it
also makes it very easy to implement either scenario, whereas your approach
makes it much more work to implement the TerminateImmediately approach.

Well, I agree with that generally. In fact, you'll note that I rarely
actually post actual code. I prefer to just describe a solution, or point
someone to the documentation. But, I'm of the mind that when I do post
code, I should try to get it as close to correct as possible. Just
because I'm not being paid, that doesn't mean I shouldn't take pride in my
work. :)

I take pride too, and the code I posted works once you implement
ProcessFile. In the past I have spent many years on newsgroups taking the
trouble to boot up my IDE and write + debug code before posting it. I came
to two conclusions

01: It takes up a lot of your day, so you get to help less people or produce
less work.
02: The company that develops the IDE are no more grateful to you for the
extra effort.

and the additional opinion that anyone who can't work out how to implement
the simple missing details is too much of a novice and will be a real drain
on my time to help fully, and once you have finished helping them they will
simply go onto the next item they need spoon feeding with. So I expect we
will both continue to post source code in our own personal style :)



Regards
 
P

Peter Morris

I am aware of the "volatile" omission, but I don't know what the 2nd bug is
you are referring to :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top