Thread/architecture question

J

JS

I am monitoring/controlling some realtime activities in a manufacturing
process.

When a part comes into my station, I have a bunch of processing to do.
There are 30-40 data acquisition and data processing steps that need to
be performed. Many of the steps rely on the results of other steps for
their processing.

I have each step coded as a separate instance. The way I'd like to
execute all the steps is to spawn a thread for each of the steps. Each
step's thread would:

1. Wait for all dependent steps to complete (WaitHandle.WaitAll).
2. Perform the desired action
3. Set a ManualResetEvent to signify that this step is done.

Now my problem is that a) I have some serious realtime issues to
contend with; b) I'd like to use a ThreadPool but I think I can only
have 20 or so threads running in the .Net ThreadPool.

Should I write my own thread pool for this, or is a thread pool the
wrong way to go? By the way, I expect to have 5-8 threads running at
any given time with another 30-40 threads waiting for an event.
 
J

Jon Skeet [C# MVP]

JS said:
I am monitoring/controlling some realtime activities in a manufacturing
process.

When a part comes into my station, I have a bunch of processing to do.
There are 30-40 data acquisition and data processing steps that need to
be performed. Many of the steps rely on the results of other steps for
their processing.

I have each step coded as a separate instance. The way I'd like to
execute all the steps is to spawn a thread for each of the steps. Each
step's thread would:

1. Wait for all dependent steps to complete (WaitHandle.WaitAll).
2. Perform the desired action
3. Set a ManualResetEvent to signify that this step is done.

Now my problem is that a) I have some serious realtime issues to
contend with; b) I'd like to use a ThreadPool but I think I can only
have 20 or so threads running in the .Net ThreadPool.

Should I write my own thread pool for this, or is a thread pool the
wrong way to go? By the way, I expect to have 5-8 threads running at
any given time with another 30-40 threads waiting for an event.

There are various other threadpools around, including my own:
http://www.pobox.com/~skeet/csharp/miscutil

It doesn't sound like one thread per step is a good idea though. Can
you not work out which of the steps can actually be executed in
parallel, and which can't, and keep the number of threads created down
in that way?
 
J

JS

Yes I might be able to work out which steps can be executed at a given
time. However, a thread pool will do this for me just by having each
step wait on its own inputs.

Is there a reason to avoid a thread pool for this situation?

Here's some more background:
The user has complete control over the list of steps. There are
hundreds of possible steps to choose from. Each step may have a set of
inputs and/or a set of outputs. A step can't begin processing until
all its inputs are ready. Once a step is done with its processing, it
sets each of its outputs to the ready state.

A step is typically either a data acquisition step or a data processing
step.

Here's an example of a set of steps:

Acquire Motion Data : output={MotionData1}
Acquire Temperature Data : output={Temp1}
Check Temperature : input={Temp1}, output={TempOK1}
Check Motion Data : input={MotionData1}, output={MotionOK1}
Check for Emergency Stop : input={TempOK1, MotionOK1}, output={EStop1}

This is a very simple set of steps. In a real system, there would be
5-15 data acquisition steps and 20-50 data processing steps.

I think I'll put together a short but complete program that details how
the software is written -- perhaps that will be instructive.

Thanks.
 
J

Jon Skeet [C# MVP]

JS said:
Yes I might be able to work out which steps can be executed at a given
time. However, a thread pool will do this for me just by having each
step wait on its own inputs.

Is there a reason to avoid a thread pool for this situation?

I can't see that a thread pool would actually help you. Thread pools
are good for lots of short-running tasks - not tasks which need to wait
until something else has happened first.
Here's some more background:
The user has complete control over the list of steps. There are
hundreds of possible steps to choose from. Each step may have a set of
inputs and/or a set of outputs. A step can't begin processing until
all its inputs are ready. Once a step is done with its processing, it
sets each of its outputs to the ready state.

A step is typically either a data acquisition step or a data processing
step.

Here's an example of a set of steps:

Acquire Motion Data : output={MotionData1}
Acquire Temperature Data : output={Temp1}
Check Temperature : input={Temp1}, output={TempOK1}
Check Motion Data : input={MotionData1}, output={MotionOK1}
Check for Emergency Stop : input={TempOK1, MotionOK1}, output={EStop1}

This is a very simple set of steps. In a real system, there would be
5-15 data acquisition steps and 20-50 data processing steps.

I think I'll put together a short but complete program that details how
the software is written -- perhaps that will be instructive.

Possibly. Are you actually sure that multi-threading will be useful
here at all? Are these definitely steps which can occur properly in
parallel anyway, using different resources? For instance, there's no
point (at least on single processor machines) in running two CPU-
intensive tasks at once.
 
J

JS

Yes, many of the data acquisition steps tell a PCI card (or other
resource) to begin data collection. Then the acquisition step waits
until the data is done. Data acquisition can take a variable amount of
time to complete, depending on how much data is acquired.

Here's a scenario for you:

1. Acquire dataset 1 (very large data set takes 50 ms to acquire)
2. Acquire dataset 2 (small data set takes 1 ms to acquire)
3. Process dataset 1 (takes 10ms)
4. Process dataset 2 (takes 30ms)

Time=0ms Start Acquiring dataset 1 and 2
Time=1ms Start processing dataset 2
Time=31ms <done processing dataset 2>
Time=50ms Start processing dataset 1
Time=60ms Done processing dataset 2

In this very simple example the CPU is really only running on one
thread at a time. What's nice about a thread pool to do this is that
it might optimize the use of the CPU. That's what I'm hoping anyway.
What is unknown to me is whether the threadpool overhead will be a
problem.

Additionally, I don't know what kind of steps I'll be executing.
Whereas the above example showed no more than 1 thread running (i.e.
not waiting) at a time, there are other possible cases. There may be 1
data acquisition step followed by 50 processing steps on this data.
This would result in 50 threads going at the same time. I guess I'm
trying to gage how much performance I might lose if I spawn 50 threads
vs. having a single thread make 50 method calls -- this is the
worst-case scenario. Any idea on this?

BTW, I started writing a short but complete program, and it's not
ending up very short. I'll keep working on it, but I may have to
e-mail the source instead of posting it.
 
S

Stefan Simek

Hi,

I suggest you write a simple scheduler, that maintains a list of all tasks
that haven't been started yet. Everytime it runs, it would start (and remove
from the list) the tasks that have all their inputs available (on the first
run, the tasks without inputs would be started). Then you would run the
scheduler again each time a new output becomes available, or when any of the
previous tasks is finished. This way, you could make use of the threadpool,
though I wouldn't do so. If we're talking about 20 - 50 tasks, each of them
taking some time, the thread creation overhead should be insignificant, so
you just might start a new thread everytime you need one and not worry about
draining the threadpool.

HTH,
Stefan
 
J

Jon Skeet [C# MVP]

JS said:
Yes, many of the data acquisition steps tell a PCI card (or other
resource) to begin data collection. Then the acquisition step waits
until the data is done. Data acquisition can take a variable amount of
time to complete, depending on how much data is acquired.

Here's a scenario for you:

1. Acquire dataset 1 (very large data set takes 50 ms to acquire)
2. Acquire dataset 2 (small data set takes 1 ms to acquire)
3. Process dataset 1 (takes 10ms)
4. Process dataset 2 (takes 30ms)

Time=0ms Start Acquiring dataset 1 and 2
Time=1ms Start processing dataset 2
Time=31ms <done processing dataset 2>
Time=50ms Start processing dataset 1
Time=60ms Done processing dataset 2

In this very simple example the CPU is really only running on one
thread at a time. What's nice about a thread pool to do this is that
it might optimize the use of the CPU.

In what way? What are you expecting a thread pool (as opposed to just
starting that many threads) to do for you automatically? I don't think
thread pools work the way you expect them to, I'm afraid.

I'll look at your email when I can - I'm looking after my 1 year old
son for most of this week though, so it may be a little while.
 
J

JS

In this very simple example the CPU is really only running on one
In what way? What are you expecting a thread pool (as opposed to just
starting that many threads) to do for you automatically? I don't think
thread pools work the way you expect them to, I'm afraid.

The reason it would be a thread pool instead of just spawning threads
is because I was under the impression that starting a new thread took
some time, and that the thread pool had less overhead in starting the
thread.

Another reason I thought it might be optimized is that each thread
would basically be doing:

WaitHandle [] waitEvents = this.GetInputEvents();
WaitHandle.WaitAll(waitEvents);
// do some work using inputs, filling outputs with data
this.SetAllOutputsReady();
this.ReadyEvent.Set();
return; // thread done

When my processing is triggered, I need to run all of the 30-50 threads
and have them all finish ASAP. So my view was that if our OS (XP) is
good, it ought to be better at scheduling thread processing than
anything I write. Perhaps I'm wrong.

Jon, no rush to look at that code I sent and have fun with your
1-year-old (I have been through it twice now!). You may find the
concept in the code mildly interesting.
All others: I could not post the code here because it was a little too
large. I can post to a website if anyone is interested. My e-mail is
jimnospam at lorusinc.com (change ' at ' to @ but don't remove the
'nospam').
 
J

Jon Skeet [C# MVP]

JS said:
The reason it would be a thread pool instead of just spawning threads
is because I was under the impression that starting a new thread took
some time, and that the thread pool had less overhead in starting the
thread.

But the thread pool has a limited number of threads available - 25 per
processor by default. Thread pools just aren't designed for your kind
of work, where the thread will wait around for a long time before
finishing (either because it's doing work or because it's waiting for
other things).
Another reason I thought it might be optimized is that each thread
would basically be doing:

WaitHandle [] waitEvents = this.GetInputEvents();
WaitHandle.WaitAll(waitEvents);
// do some work using inputs, filling outputs with data
this.SetAllOutputsReady();
this.ReadyEvent.Set();
return; // thread done

But the thread pool can't get the thread to switch to another context
during the WaitAll
When my processing is triggered, I need to run all of the 30-50 threads
and have them all finish ASAP. So my view was that if our OS (XP) is
good, it ought to be better at scheduling thread processing than
anything I write. Perhaps I'm wrong.

I'm sure XP can handle 50 threads - although starting a new batch of 50
for every single job would be a bad idea, have you considered having
one thread per *type* of step, and have the same 50 threads running the
whole time, waiting for things to do?

However, it seems to me that you're unlikely to really *need* 50
threads. If one step *only* depends on the output of the previous step,
then they can effectively be combined into one step. Do this everywhere
you can, and you should be able to significantly reduce the number of
threads required. Similarly a thread which is needed for some parallel
work early on can then do some more parallel work later, when it's
available - there's no need to start a new thread for each of them.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top