Dave Booker said:
Pete, it sounds like you have the right answers, so let me clarify the
question:
Apparently I should avoid using the word "ThreadPool" in reference to my
problem. My calculation pieces are essentially 100% CPU: i.e., assume
that
there is no I/O or any other waiting from the time a piece is Invoke()'d
to
the time it ends. So as you point out ideally I would have exactly 1
piece
running for each free Execution Unit in the computer. (I'm testing on a
nice
dual-Xeon-DP-HT server = 8 Execution Units.)
Maybe I should not use the word "Thread" to describe the parallelism I am
using. I am definitely creating parallel execution, but I am doing it
only
by using a delegate .BeginInvoke().
I don't know. Are you using the thread pool or not? Are you using a
separate thread or not?
I'm probably the wrong person to be giving specifics on the .NET side of
things. I'm relatively new to .NET myself, though I've got plenty of
Windows API programming experience. If there's a way to use BeginInvoke to
start up a new thread or to cause a thread in the thread pool to execute
your delegate, then I think the word "thread" is fine in this context.
Conversely, if "thread" is not a fine word in this context, I don't see how
you could possibly be "creating parallel execution". That's the only way to
have code running simultaneously within a process. There's no other way in
Windows.
Let me know if I need to be using a
different word than "thread" for parallel execution within a single
process
by this means.
If you can be more specific about the exact means you use to start the
parallel execution (eg post some code), then answering that question would
be easier. Absent that, I'm uncomfortable trying to state one way or the
other whether the word "thread" applies here.
I know ahead of time the maximum number of Execution Units on a machine.
I
want up to that many threads running simultaneously, BUT I want them to be
completely preempted by any other machine process. And, as you noted and
I
have seen, once I start them, no matter how low their process priority is
they will not completely get out of the way when a higher-priority task
comes
along.
That's true. But it is also my opinion that as long as you don't have too
many, and as long as they are low enough priority, they should not cause any
serious impediment to other processes on the same computer. Of course, I
have no idea what those other processes are, or how much CPU time they can
afford to share with you. So I could be completely wrong about that.
Still, given the difficulty in addressing the problem in the way you
suggest, it seems like it would be worth a try to just use fewer threads.
Since your work is entirely CPU bound, you should be able to get by with
just one thread per CPU, rather than the extras I mentioned earlier. Set
them to the lowest possible priority, and see if that allows other stuff to
work okay. If not, *then* look at the harder, more complicated solution.
As far as your specific idea goes...
My ideal solution would look like the following: I start my application
and
let it know its maximum threadcount. It looks at the CPU load and when it
sees more than 1 (or n) Execution Unit free it starts a new thread. When
any
thread finishes, it again checks the CPU load and it won't start another
unless more than 1 CPU unit is free. That way it will always leave at
least
1 Execution Unit free for other processes. I.e., my application will
never
be in the way of anyone else for longer than it takes a single calculation
piece to return.
The beauty of this is that I could maximize the usage on some very capable
machines that have tons of idle time. Otherwise, I have to limit my
application to my estimate of the minimum idle time on the machine, or
else I
will impact more urgent processes.
The thing is, it seems to me that if the other processes on the system are
*so* urgent that you cannot afford to have your lower priority tasks run at
all if something else needs the CPU, then you just can't use that system to
run your tasks. Even the solution you mention above does not always avoid
using the CPU when another process may want it. It just attempts to
minimize how much it gets in the way.
On the other hand, if it's okay for your tasks to occasionally take CPU time
that could otherwise be applied to other processes, then that seems like the
perfect definition for running your tasks as low priority threads, with
enough to keep all of the CPUs busy if and when they are all idle. The
additional complication of checking the current CPU load adds complexity
without much if any gain.
Of course, there is also the technical issue of getting accurate CPU load
information. I don't know the mechanism off the top of my head that you'd
use to do this, though I'd guess there's a performance counter you can get
at to tell you this. But the best you can do is learn the CPU load in the
recent past. Any data you get will be historical, and there's no way to
know what the CPU load is going to be like when you get one of your threads
running. I haven't given it much thought, but it would not even surprise me
if there was some sort of mode that you could get yourself into where your
own tasks ramp up their execution at exactly the wrong moments every single
time, because of some interaction between your measurement of CPU load and
the other running processes.
And of course there is the issue of using up additional CPU time just
*managing* your tasks, rather than letting the OS thread scheduler take care
of that for you. If you decide that taking CPU from other tasks is okay to
some limited degree, but not enough that you can risk running one low
priority thread per CPU, I'd think a much better solution would be to run
some number of threads as a fraction of the total CPUs available. With 8
execution units, for example, run 4 or 6 threads. That way you know that
there will always be *some* execution units not assigned to your task.
You'll sacrifice throughput on your tasks, but it seems to me that that's
the cost of ensuring other processes always take precedence no matter what
(and frankly, even then you'll wind up taking *some* CPU time that otherwise
could have been applied to the other processes).
Please forgive me if I'm just not "seeing the light" with respect to your
proposed solution. It just seems to me that you have a self-contradictory
set of goals, and that you need to decide whether it's okay for your
low-priority tasks to take *any* time from other processes. If it is, then
just let the Windows thread scheduler take care of that. If it's not, then
your tasks don't belong on the computers where those critical processes
exist in the first place. In neither case does it seem like trying to track
CPU load and manage a pool of threads in response to that would be useful.
I'm a big fan of "keep it simple" (aka "KISS"

). Computer systems today
are already plagued by unnecessary complexity. I'd think twice before
contributing to the problem.
Pete