Workaround for WaitHandle limitation?

Guest · Nov 4, 2006

I'm doing some analysis that is readily broken up into many independent
pieces, on a multicore machine.

I thought it would be best to just queue like 1000 of these pieces in the
ThreadPool, and let that object take care of running them in the background
on the machine's free cycles.

But I need the program to wait for all the queued pieces to finish before it
continues. I tried giving each work item a ManualResetEvent and using
WaitHandler to detect when they've all finished, but it turns out that
WaitHandle.WaitAll() won't take more than 64 WaitHandlers.

What is the right way to detect that a process's ThreadPool has finished its
queue when we're looking at queues of this size?

Guest · Nov 4, 2006

Hi Dave,
creating a 1000 threads is a lot of overhead, also you should not tie up
threads in the thread pool for long periods of time they are really only to
be used for quick tasks.

I would suggest that you create a number of your own threads, not 1000 but
maybe 10 / 20, these threads will then consume from where ever you data is
being produced, each thread will process some of the data and then place the
processed data into a list or some structure once it is done, it can then
return back and process another piece of the data. Once all of the pieces
have been processed you can they signal on a single WaitHandle which will
then allow the rest of your program to move forward.

Mark.

Guest · Nov 4, 2006

Thanks, that makes sense.

My remaining question, then, is whether I can automatically scale the number
of threads being used at any given time by the CPU idle capacity on the
computer? I want to run this on servers as a "low-priority" process, but one
which will exploit all the spare CPU capacity when the server isn't doing
anything else.

I'm spinning off the calculation pieces for asynchronous execution using
..BeginInvoke(). Setting my process as low-priority doesn't seem to do the
trick -- if I have too many threads running it will still encroach somewhat
on higher-priority tasks.

So I think an ideal solution would be one in which I can detect current CPU
usage (and add threads when I see it dropping below some threshold). Is
something like that possible in .NET?

Guest · Nov 4, 2006

Suggest you have a look at Ami Bar's "SmartThreadPool", which you can find
over at codeproject.com. He's gotten around the 64 problem quite elegantly.
However, Marks' comment would still be appropriate.
Peter

Peter Duniho · Nov 5, 2006

Dave Booker said:
[...]
I'm spinning off the calculation pieces for asynchronous execution using
.BeginInvoke(). Setting my process as low-priority doesn't seem to do the
trick -- if I have too many threads running it will still encroach
somewhat
on higher-priority tasks.

So I think an ideal solution would be one in which I can detect current
CPU
usage (and add threads when I see it dropping below some threshold). Is
something like that possible in .NET?

You've got a couple of issues (in theory anyway...see my last paragraph for
why this may not be true in practice

). One is that as you increase the
number of threads, more of *your* stuff gets CPU time, since each thread is
treated mostly equally, with the same timeslice duration. Low priority
threads still get run. Higher priority threads should, of course, get
preferential treatment, but they won't normally starve the other threads
completely, and if there are LOTS of "the other threads" they can still
consume a significant amount of CPU time, taking away from other threads,
even higher priority ones.

Another issue is that if you have more threads than execution units (CPUs,
cores, whatever), then all of *your* threads compete with each other for the
CPU time. Since you have a finite amount of CPU time, and since you need
for *all* of your threads to complete, there's no point at all in having all
those threads fighting with each other. It just slows the whole thing down.

In a perfect world, the optimal number of threads would be equal to the
number of execution units. In reality, your work may involve something
other than just pure CPU execution and so a worker thread could wind up
getting blocked on that other something. If that's the case, it makes sense
to have some other worker threads so that they can take over and get some
work done if one of the other worker threads blocks. (Assuming, of course,
there aren't other processes on the system that take precedence).

So, in other words...IMHO the most appropriate solution is to check the
number of CPUs, and then start up something like 1.5 or 2 times that number
in worker threads (rounded up to a whole number, of course

). This
should address your desire to ensure that the CPU is not left idle, without
adding an excessive number of threads and causing overall performance
problems.

Now, all that said, it's not clear to me that you are actually creating
multiple threads. BeginInvoke should just post delegates to be run on your
main GUI thread. Perhaps you could clarify why it is you believe that you
are actually creating multiple threads for your processing work?

Pete

Guest · Nov 5, 2006

Pete, it sounds like you have the right answers, so let me clarify the
question:

Apparently I should avoid using the word "ThreadPool" in reference to my
problem. My calculation pieces are essentially 100% CPU: i.e., assume that
there is no I/O or any other waiting from the time a piece is Invoke()'d to
the time it ends. So as you point out ideally I would have exactly 1 piece
running for each free Execution Unit in the computer. (I'm testing on a nice
dual-Xeon-DP-HT server = 8 Execution Units.)

Maybe I should not use the word "Thread" to describe the parallelism I am
using. I am definitely creating parallel execution, but I am doing it only
by using a delegate .BeginInvoke(). Let me know if I need to be using a
different word than "thread" for parallel execution within a single process
by this means.

I know ahead of time the maximum number of Execution Units on a machine. I
want up to that many threads running simultaneously, BUT I want them to be
completely preempted by any other machine process. And, as you noted and I
have seen, once I start them, no matter how low their process priority is
they will not completely get out of the way when a higher-priority task comes
along.

My ideal solution would look like the following: I start my application and
let it know its maximum threadcount. It looks at the CPU load and when it
sees more than 1 (or n) Execution Unit free it starts a new thread. When any
thread finishes, it again checks the CPU load and it won't start another
unless more than 1 CPU unit is free. That way it will always leave at least
1 Execution Unit free for other processes. I.e., my application will never
be in the way of anyone else for longer than it takes a single calculation
piece to return.

The beauty of this is that I could maximize the usage on some very capable
machines that have tons of idle time. Otherwise, I have to limit my
application to my estimate of the minimum idle time on the machine, or else I
will impact more urgent processes.

Peter Duniho · Nov 5, 2006

Dave Booker said:
Pete, it sounds like you have the right answers, so let me clarify the
question:

Apparently I should avoid using the word "ThreadPool" in reference to my
problem. My calculation pieces are essentially 100% CPU: i.e., assume
that
there is no I/O or any other waiting from the time a piece is Invoke()'d
to
the time it ends. So as you point out ideally I would have exactly 1
piece
running for each free Execution Unit in the computer. (I'm testing on a
nice
dual-Xeon-DP-HT server = 8 Execution Units.)

Maybe I should not use the word "Thread" to describe the parallelism I am
using. I am definitely creating parallel execution, but I am doing it
only
by using a delegate .BeginInvoke().

I don't know. Are you using the thread pool or not? Are you using a
separate thread or not?

I'm probably the wrong person to be giving specifics on the .NET side of
things. I'm relatively new to .NET myself, though I've got plenty of
Windows API programming experience. If there's a way to use BeginInvoke to
start up a new thread or to cause a thread in the thread pool to execute
your delegate, then I think the word "thread" is fine in this context.

Conversely, if "thread" is not a fine word in this context, I don't see how
you could possibly be "creating parallel execution". That's the only way to
have code running simultaneously within a process. There's no other way in
Windows.

Let me know if I need to be using a
different word than "thread" for parallel execution within a single
process
by this means.

If you can be more specific about the exact means you use to start the
parallel execution (eg post some code), then answering that question would
be easier. Absent that, I'm uncomfortable trying to state one way or the
other whether the word "thread" applies here.

I know ahead of time the maximum number of Execution Units on a machine.
I
want up to that many threads running simultaneously, BUT I want them to be
completely preempted by any other machine process. And, as you noted and
I
have seen, once I start them, no matter how low their process priority is
they will not completely get out of the way when a higher-priority task
comes
along.

That's true. But it is also my opinion that as long as you don't have too
many, and as long as they are low enough priority, they should not cause any
serious impediment to other processes on the same computer. Of course, I
have no idea what those other processes are, or how much CPU time they can
afford to share with you. So I could be completely wrong about that.

Still, given the difficulty in addressing the problem in the way you
suggest, it seems like it would be worth a try to just use fewer threads.
Since your work is entirely CPU bound, you should be able to get by with
just one thread per CPU, rather than the extras I mentioned earlier. Set
them to the lowest possible priority, and see if that allows other stuff to
work okay. If not, *then* look at the harder, more complicated solution.

As far as your specific idea goes...

My ideal solution would look like the following: I start my application
and
let it know its maximum threadcount. It looks at the CPU load and when it
sees more than 1 (or n) Execution Unit free it starts a new thread. When
any
thread finishes, it again checks the CPU load and it won't start another
unless more than 1 CPU unit is free. That way it will always leave at
least
1 Execution Unit free for other processes. I.e., my application will
never
be in the way of anyone else for longer than it takes a single calculation
piece to return.

The beauty of this is that I could maximize the usage on some very capable
machines that have tons of idle time. Otherwise, I have to limit my
application to my estimate of the minimum idle time on the machine, or
else I
will impact more urgent processes.

The thing is, it seems to me that if the other processes on the system are
*so* urgent that you cannot afford to have your lower priority tasks run at
all if something else needs the CPU, then you just can't use that system to
run your tasks. Even the solution you mention above does not always avoid
using the CPU when another process may want it. It just attempts to
minimize how much it gets in the way.

On the other hand, if it's okay for your tasks to occasionally take CPU time
that could otherwise be applied to other processes, then that seems like the
perfect definition for running your tasks as low priority threads, with
enough to keep all of the CPUs busy if and when they are all idle. The
additional complication of checking the current CPU load adds complexity
without much if any gain.

Of course, there is also the technical issue of getting accurate CPU load
information. I don't know the mechanism off the top of my head that you'd
use to do this, though I'd guess there's a performance counter you can get
at to tell you this. But the best you can do is learn the CPU load in the
recent past. Any data you get will be historical, and there's no way to
know what the CPU load is going to be like when you get one of your threads
running. I haven't given it much thought, but it would not even surprise me
if there was some sort of mode that you could get yourself into where your
own tasks ramp up their execution at exactly the wrong moments every single
time, because of some interaction between your measurement of CPU load and
the other running processes.

And of course there is the issue of using up additional CPU time just
*managing* your tasks, rather than letting the OS thread scheduler take care
of that for you. If you decide that taking CPU from other tasks is okay to
some limited degree, but not enough that you can risk running one low
priority thread per CPU, I'd think a much better solution would be to run
some number of threads as a fraction of the total CPUs available. With 8
execution units, for example, run 4 or 6 threads. That way you know that
there will always be *some* execution units not assigned to your task.
You'll sacrifice throughput on your tasks, but it seems to me that that's
the cost of ensuring other processes always take precedence no matter what
(and frankly, even then you'll wind up taking *some* CPU time that otherwise
could have been applied to the other processes).

Please forgive me if I'm just not "seeing the light" with respect to your
proposed solution. It just seems to me that you have a self-contradictory
set of goals, and that you need to decide whether it's okay for your
low-priority tasks to take *any* time from other processes. If it is, then
just let the Windows thread scheduler take care of that. If it's not, then
your tasks don't belong on the computers where those critical processes
exist in the first place. In neither case does it seem like trying to track
CPU load and manage a pool of threads in response to that would be useful.

I'm a big fan of "keep it simple" (aka "KISS"

). Computer systems today
are already plagued by unnecessary complexity. I'd think twice before
contributing to the problem.

Pete

Jon Skeet [C# MVP] · Nov 5, 2006

Dave Booker said:
Maybe I should not use the word "Thread" to describe the parallelism I am
using. I am definitely creating parallel execution, but I am doing it only
by using a delegate .BeginInvoke(). Let me know if I need to be using a
different word than "thread" for parallel execution within a single process
by this means.

You're definitely using threads, and you are in fact using the
threadpool. I believe the problem is just in communication here - when
you call Control.BeginInvoke, that schedules a delegate for execution
on the main GUI thread - that's what Peter thought you meant.

When you call BeginInvoke *on* a delegate, that schedules it for
execution in the ThreadPool.

I would echo Mark's recommendation of what is essentially a
producer/consumer relationship. Start as many threads as you've got
core (or as many as you want to use) as "consumers" and have a
"producer" adding work items to the queue.

See http://www.pobox.com/~skeet/csharp/threads/deadlocks.shtml (half
way down the page) for an example implementation of a producer/consumer
queue.

Carl Daniel [VC++ MVP] · Nov 5, 2006

Peter said:
I don't know. Are you using the thread pool or not? Are you using a
separate thread or not?

delegate.BeginInvoke causes an item to be queued to the thread pool -
definitely using another thread.

-cd

Guest · Nov 5, 2006

Abbreviated Question: Can we use C# code to determine the peak CPU load
within the last x seconds (using a sample period of y milliseconds, if need
be)?

Exposition:

I am definitely a big fan of keeping it simple, but I believe it is very
worthwhile to go the extra mile on this problem: After all, there are tons
of scientific computing projects out there that want to take advantage of all
the free cycles floating around the worldâ€™s computers. I think my scenario
is very common: You have a worthwhile computational task thatâ€™s massively
parallelizable, and a server with plenty of free cycles that people are
willing to let you use. But they wonâ€™t let you use it if your task starts to
really get in the way of the processes for which the server was actually
purchased.

Letâ€™s take a concrete example: I have 10000 independent computation pieces
that essentially are 100% CPU tasks and that will consume 1 CPU-thread-minute
before returning. Letâ€™s say I have an 8-CPU-thread database server whose CPU
is sitting completely idle 90% of the time. So I would like to crunch up to
8 pieces at a time on that machine. But if the database wants the CPU for
something, I had better get out of its way within 10 seconds or else the
users will start to complain.

Itâ€™s true that I could pick a small number â€“ like 2 in this case â€“ and just
run that many threads. That way in the absolute worst case Iâ€™m only
impacting server performance by something like 25%, and in practice nobody
would ever notice. But meanwhile I have given up the 90% of the time that I
could run up to 8 threads, so Iâ€™m throwing away up to 70% of my computing
resources! (This can make a big difference when weâ€™re talking about queuing
2 months of CPU time!)

Hereâ€™s what we actually want to happen: When I start my process the CPU is
idle, so I start 6 threads. That way thereâ€™s a 2-thread margin for the
database and other system processes to use. On average my 6 threads will
return every 10 seconds, so if I see that the CPU is 100% occupied at any
recent time I can drop a thread within 10 seconds. When I see that the CPU
is less than 75% utilized I can add a thread.

Hopefully youâ€™ll agree that this scenario is both realistic and worthwhile.

Now, it looks like we could pull it off if we can find a way for the C# code
to detect the peak CPU load within the last x seconds. This shouldnâ€™t be
difficult â€“ after all, I can open the Task Manager Performance tab and
visually answer that question from the CPU Usage graphs.

(I had hoped the O/S would have provided a more elegant mechanism to
prioritize threads -- e.g., offering something like a "preemptible" thread
flag that means the thread doesn't care when it's executed, so it will never
interfere with another thread's execution. But this brute-management
approach is certainly better than nothing.)

:
....

Peter Duniho · Nov 5, 2006

Carl Daniel said:
delegate.BeginInvoke causes an item to be queued to the thread pool -
definitely using another thread.

Okay...as Jon surmised, I got confused about the Control.BeginInvoke vs the
method directly on a delegate.

So I would say "thread" is the right term to use, in this case.

Peter Duniho · Nov 5, 2006

Dave Booker said:
[...]
Here's what we actually want to happen: When I start my process the CPU
is
idle, so I start 6 threads. That way there's a 2-thread margin for the
database and other system processes to use. On average my 6 threads will
return every 10 seconds, so if I see that the CPU is 100% occupied at any
recent time I can drop a thread within 10 seconds. When I see that the
CPU
is less than 75% utilized I can add a thread.

Hopefully you'll agree that this scenario is both realistic and
worthwhile.

I did understand your goal the first time. So unfortunately, no...I can't
say that reiterating it has changed my mind about the solution you're aiming
for. If anything, providing a specific example makes me question the goal
that much more.

After all, if you can tolerate a 10 second latency in the server to return
control to the higher priority tasks, I just can't imagine that a low
priority thread that's running all the time could cause any real issues with
the performance of the server overall.

I have to say, I have a hard time imagining a server in which a 10 second
latency is ever acceptable anyway. But assuming it is, I can't imagine that
such a server has any critical need for completely optimal performance
anyway.

If your example had been for a 1/10th of a second latency, or something
along those lines, I might be more willing to accept the desirability of the
solution, even if not the degree of how realistic it would be to anticipate
implementing such a solution. But at 10 seconds, there's no question that
you're not dealing with a server that cannot afford to share even a tiny
fraction of the CPU time. Which, by the way, is about all a handful of low
priority threads would consume...really, until you've actually tried that
version of the solution and can quantify just how much latency for the
primary server tasks is lost to your secondary tasks, I think it's premature
for you to say you MUST have the more complex solution.

That said, I hope that whatever solution you do choose to attempt to
implement, you've found useful input in this message thread.

Pete

Guest · Nov 5, 2006

OK, I guess that's a reasonable argument: If we can tolerate any
interference from these "preemptible" threads, then we should be able to
tolerate the interference they create as fulltime background threads.

So let me just press my other assumption one more time: Is there in fact no
O/S, CLR, or other facility for running a thread as "fully-preemptible,"
meaning it will never contend with a non-preemptible thread for CPU time?

Peter Duniho said:
Dave Booker said:

[...]
Here's what we actually want to happen: When I start my process the CPU
is
idle, so I start 6 threads. That way there's a 2-thread margin for the
database and other system processes to use. On average my 6 threads will
return every 10 seconds, so if I see that the CPU is 100% occupied at any
recent time I can drop a thread within 10 seconds. When I see that the
CPU
is less than 75% utilized I can add a thread.

Hopefully you'll agree that this scenario is both realistic and
worthwhile.

Click to expand...

I did understand your goal the first time. So unfortunately, no...I can't
say that reiterating it has changed my mind about the solution you're aiming
for. If anything, providing a specific example makes me question the goal
that much more.

After all, if you can tolerate a 10 second latency in the server to return
control to the higher priority tasks, I just can't imagine that a low
priority thread that's running all the time could cause any real issues with
the performance of the server overall.

I have to say, I have a hard time imagining a server in which a 10 second
latency is ever acceptable anyway. But assuming it is, I can't imagine that
such a server has any critical need for completely optimal performance
anyway.

If your example had been for a 1/10th of a second latency, or something
along those lines, I might be more willing to accept the desirability of the
solution, even if not the degree of how realistic it would be to anticipate
implementing such a solution. But at 10 seconds, there's no question that
you're not dealing with a server that cannot afford to share even a tiny
fraction of the CPU time. Which, by the way, is about all a handful of low
priority threads would consume...really, until you've actually tried that
version of the solution and can quantify just how much latency for the
primary server tasks is lost to your secondary tasks, I think it's premature
for you to say you MUST have the more complex solution.

That said, I hope that whatever solution you do choose to attempt to
implement, you've found useful input in this message thread.

Pete

Peter Duniho · Nov 5, 2006

Dave Booker said:
OK, I guess that's a reasonable argument: If we can tolerate any
interference from these "preemptible" threads, then we should be able to
tolerate the interference they create as fulltime background threads.

Well, that's what I think anyway.

So let me just press my other assumption one more time: Is there in fact
no
O/S, CLR, or other facility for running a thread as "fully-preemptible,"
meaning it will never contend with a non-preemptible thread for CPU time?

If you believe the Windows API documentation, all you need to do is set the
thread to THREAD_PRIORITY_IDLE, and you will be guaranteed that normal
priority threads on the system will run normally. The docs even say that if
a higher priority thread becomes runnable, a lower priority thread will be
preempted to allow the higher priority thread to run.
http://msdn.microsoft.com/library/en-us/dllproc/base/scheduling_priorities.asp

Presumably the .NET implementation of threads allows you to change the
thread priority, and so should provide this same functionality. I haven't
checked myself, so I can't say for sure.

I admit, I'm not an expert on thread scheduling on Windows and so I can't
vouch for the accuracy of the documentation. But I don't see any reason to
doubt it.

Note that, after reviewing the docs again, it appears to me that in Vista
they've added a THREAD_MODE_BACKGROUND_BEGIN and THREAD_MODE_BACKGROUND_END
value for the SetThreadPriority function. I haven't found a good
elaboration on what this does exactly, but one mention of the flag suggests
that it provides a way to ensure that low priority threads don't interfere
with higher priority threads that compete for i/o resources. I'm not sure
this would have any bearing on your tasks, since you say they are entirely
CPU bound, but if you have trouble with the idle priority threads still
slowing down higher priority threads, you might see if using the background
mode values helps.

Pete

Willy Denoyette [MVP] · Nov 6, 2006

Don't know exactly what you mean with "non-preemptible" threads, Windows
threads can always be pre-emted even threads running at real-time priority
level are pre-emptible.
And 'runable' threads running at lower priority are getting a chance to run
when higher priority threads tend to monopolize the CPU(s), this is done in
order to prevent starvation of low priority threads. The workstation
scheduler will wake low priority runables every ~5 secs, don't know about
the server scheduler but I guess it's the same interval value.

Willy.

| OK, I guess that's a reasonable argument: If we can tolerate any
| interference from these "preemptible" threads, then we should be able to
| tolerate the interference they create as fulltime background threads.
|
| So let me just press my other assumption one more time: Is there in fact
no
| O/S, CLR, or other facility for running a thread as "fully-preemptible,"
| meaning it will never contend with a non-preemptible thread for CPU time?
|
|
| "Peter Duniho" wrote:
|
| > | > > [...]
| > > Here's what we actually want to happen: When I start my process the
CPU
| > > is
| > > idle, so I start 6 threads. That way there's a 2-thread margin for
the
| > > database and other system processes to use. On average my 6 threads
will
| > > return every 10 seconds, so if I see that the CPU is 100% occupied at
any
| > > recent time I can drop a thread within 10 seconds. When I see that
the
| > > CPU
| > > is less than 75% utilized I can add a thread.
| > >
| > > Hopefully you'll agree that this scenario is both realistic and
| > > worthwhile.
| >
| > I did understand your goal the first time. So unfortunately, no...I
can't
| > say that reiterating it has changed my mind about the solution you're
aiming
| > for. If anything, providing a specific example makes me question the
goal
| > that much more.
| >
| > After all, if you can tolerate a 10 second latency in the server to
return
| > control to the higher priority tasks, I just can't imagine that a low
| > priority thread that's running all the time could cause any real issues
with
| > the performance of the server overall.
| >
| > I have to say, I have a hard time imagining a server in which a 10
second
| > latency is ever acceptable anyway. But assuming it is, I can't imagine
that
| > such a server has any critical need for completely optimal performance
| > anyway.
| >
| > If your example had been for a 1/10th of a second latency, or something
| > along those lines, I might be more willing to accept the desirability of
the
| > solution, even if not the degree of how realistic it would be to
anticipate
| > implementing such a solution. But at 10 seconds, there's no question
that
| > you're not dealing with a server that cannot afford to share even a tiny
| > fraction of the CPU time. Which, by the way, is about all a handful of
low
| > priority threads would consume...really, until you've actually tried
that
| > version of the solution and can quantify just how much latency for the
| > primary server tasks is lost to your secondary tasks, I think it's
premature
| > for you to say you MUST have the more complex solution.
| >
| > That said, I hope that whatever solution you do choose to attempt to
| > implement, you've found useful input in this message thread.

| >
| > Pete
| >
| >
| >

Workaround for WaitHandle limitation?

Guest

Guest

Guest

Guest

Peter Duniho

Guest

Peter Duniho

Jon Skeet [C# MVP]

Carl Daniel [VC++ MVP]

Guest

Peter Duniho

Peter Duniho

Guest

Peter Duniho

Willy Denoyette [MVP]