Thread Question - When do they die?

J

jehugaleahsa

I have some code that looks like this:

int[] customerIds = { 1, 2, 3, 4, 5, 6, 7, 8 };
List<Thread> threads = new List<Threads>();
foreach (int customerId in customerIds)
{
Thread thread = new Thread(delegate()
{
// do calculation with customerId
});
threads.Add(thread);
thread.Start();
}
foreach (Thread thread in threads)
{
thread.Join();
}

First, when I run this code, it appears as though 8 is executed
multiple times. My guess is that the thread continues on after the
delegate finishes, and since MoveNext() on the List<int> just returns
the last item over an over, we see 8 multiple times. Is that accurate?

Furthermore, something seems wrong about storing threads in a
List<Thread> and then joining. Is there a better way to Join these
theads without keeping a reference to them around?

Thanks,
Travis
 
J

jehugaleahsa

First, when I run this code, it appears as though 8 is executed
multiple times. My guess is that the thread continues on after the
delegate finishes, and since MoveNext() on the List<int> just returns
the last item over an over, we see 8 multiple times. Is that accurate?

I see I was totally wrong. Is this caused by me using anonymous
methods?
 
P

Peter Duniho

First, when I run this code, it appears as though 8 is executed
multiple times. My guess is that the thread continues on after the
delegate finishes, and since MoveNext() on the List<int> just returns
the last item over an over, we see 8 multiple times. Is that accurate?

Not really. As you guessed, this has to do with the anonymous method. In
addition, the behavior you're seeing is non-deterministic. If for some
reason the thread creating the other threads got interrupted before it
finished the loop, you'd see other values than "8".

What's going on is that the variable "customerId" is "captured" when it's
used in the anonymous method. Instead of it being stored locally, the
compiler creates a variable on the heap that's used instead, so that the
lifetime of the variable can be long enough for it to be used by the
anonymous method.

But the variable only exists once. All of your anonymous methods are
using the same variable, and none of them get to execute (in the test you
ran) until all of the threads have been created and started, by which time
that variable holds the value "8". And that's the value that all the
threads use.

You can easily create a new instance of the variable for each thread by
defining a new variable within the block. Since according to the C# block
rules, the variable is created anew with each iteration of the loop, each
anonymous method winds up with its own variable. For example:

foreach (int i in customerIds)
{
int customerId = i;

Thread thread = new Thread(delegate()
{
// do calculation with customerId
});
threads.Add(thread);
thread.Start();
}
Furthermore, something seems wrong about storing threads in a
List<Thread> and then joining. Is there a better way to Join these
theads without keeping a reference to them around?

Well, I don't see the problem with keeping the reference to them around.
It seems like a reasonably elegant solution, assuming what you really want
to do is to not proceed until all of the threads have completed. But if
you don't like that technique, you could use other synchronization
techniques. For example:

AutoResetEvent are = new AutoResetEvent(false);
int cthread;
int[] customerIds = { 1, 2, 3, 4, 5, 6, 7, 8 };

cthread = customerIds.Length;
foreach (int i in customerIds)
{
int customerId = i;

Thread thread = new Thread(delegate()
{
// do calculation with customerId
if (Interlocked.Decrement(ref cthread) == 0)
{
are.Set();
}
});
thread.Start();
}

are.WaitOne();

That creates a counter equal to the number of threads, decrementing it at
the end for each thread. When the counter reaches zero, the thread that
set it to zero sets an event that allows the original thread to continue..

Pete
 
J

jehugaleahsa

Well, I don't see the problem with keeping the reference to them around.  
It seems like a reasonably elegant solution, assuming what you really want 
to do is to not proceed until all of the threads have completed.  But if 
you don't like that technique, you could use other synchronization  
techniques.  For example:

     AutoResetEvent are = new AutoResetEvent(false);
     int cthread;
     int[] customerIds = { 1, 2, 3, 4, 5, 6, 7, 8 };

     cthread = customerIds.Length;
     foreach (int i in customerIds)
     {
         int customerId = i;

         Thread thread = new Thread(delegate()
         {
             // do calculation with customerId
             if (Interlocked.Decrement(ref cthread) == 0)
             {
                 are.Set();
             }
         });
         thread.Start();
     }

     are.WaitOne();

That creates a counter equal to the number of threads, decrementing it at  
the end for each thread.  When the counter reaches zero, the thread that 
set it to zero sets an event that allows the original thread to continue.

Pete

That is complicated. If you say keeping it around is elegant, I'm all
for it.

This is an issue I ran into today when my lead developer claimed that
my C# implementation couldn't compete with the old C++ implementation.
Even in debug, mine whiped the floor.

However, he wanted to see if mine could handle being multi-threaded.
However, since all my static variables are read-only and all other
variables are local, I knew the problem I was experiencing was
somewhere else. I was able to track it down to a similar example as
above. I am glad you could explain it to me.

Tomorrow, I should be able to go into work and crush him yet again. He
is pretty smart to attack my competitive side when it comes to high-
performance code.

Speaking of which, one time when I ran some code it took 46 seconds.
The next time I ran it, it took 12 seconds. Does the JIT, or the MS
equivilent, hang around across debug sessions in VS? or am I
experiencing ups and downs on our database system, perhaps?

Thanks,
Travis
 
P

Peter Duniho

[...]
Tomorrow, I should be able to go into work and crush him yet again. He
is pretty smart to attack my competitive side when it comes to high-
performance code.

I suppose. I'm not personally a big fan of "crushing" my co-workers, but
if that's how your work team finds it best for producing quality code, go
for it. :)
Speaking of which, one time when I ran some code it took 46 seconds.
The next time I ran it, it took 12 seconds. Does the JIT, or the MS
equivilent, hang around across debug sessions in VS? or am I
experiencing ups and downs on our database system, perhaps?

Hard to answer without knowing what it is you're measuring. If you're
measuring the total run time of an application, there can easily be a
large delay the first time running it that you don't see in subsequent
times, as a variety of things get loaded for the first run but then remain
cached in memory, ready to go for subsequent runs.

One part of .NET I haven't paid much attention to is the Global Assembly
Cache. As the name suggests, it's a cache of some sort, but I don't know
everything it does. I see it mentioned most commonly with respect to
people trying to get their code in the cache. :) Maybe when your
assembly is first run, the JIT-compiled code winds up there, speeding up
subsequent executions.

If you've got a timer on a specific section of code, then depending on
what that code actually does, again it could be a simple matter of disk
caching, or if your application is interacting with external processes
(you mention a "database system", which may include an actual database
server) then there could be caching going on there.

I'd say that the most likely explanation is caching _somewhere_, but
exactly where is hard to say without knowing more about what you're
measuring.

Pete
 
J

Jeroen Mostert

Peter said:
First, when I run this code, it appears as though 8 is executed
multiple times. My guess is that the thread continues on after the
delegate finishes, and since MoveNext() on the List<int> just returns
the last item over an over, we see 8 multiple times. Is that accurate?

Not really. As you guessed, this has to do with the anonymous method.
In addition, the behavior you're seeing is non-deterministic. If for
some reason the thread creating the other threads got interrupted before
it finished the loop, you'd see other values than "8".

What's going on is that the variable "customerId" is "captured" when
it's used in the anonymous method. Instead of it being stored locally,
the compiler creates a variable on the heap that's used instead, so that
the lifetime of the variable can be long enough for it to be used by the
anonymous method.

But the variable only exists once. All of your anonymous methods are
using the same variable, and none of them get to execute (in the test
you ran) until all of the threads have been created and started, by
which time that variable holds the value "8". And that's the value that
all the threads use.

You can easily create a new instance of the variable for each thread by
defining a new variable within the block. Since according to the C#
block rules, the variable is created anew with each iteration of the
loop, each anonymous method winds up with its own variable. For example:

foreach (int i in customerIds)
{
int customerId = i;

Thread thread = new Thread(delegate()
{
// do calculation with customerId
});
threads.Add(thread);
thread.Start();
}
Furthermore, something seems wrong about storing threads in a
List<Thread> and then joining. Is there a better way to Join these
theads without keeping a reference to them around?

Well, I don't see the problem with keeping the reference to them
around. It seems like a reasonably elegant solution, assuming what you
really want to do is to not proceed until all of the threads have
completed. But if you don't like that technique, you could use other
synchronization techniques. For example:

AutoResetEvent are = new AutoResetEvent(false);
int cthread;
int[] customerIds = { 1, 2, 3, 4, 5, 6, 7, 8 };

cthread = customerIds.Length;
foreach (int i in customerIds)
{
int customerId = i;

Thread thread = new Thread(delegate()
{
// do calculation with customerId
if (Interlocked.Decrement(ref cthread) == 0)
{
are.Set();
}
});
thread.Start();
}

are.WaitOne();

That creates a counter equal to the number of threads, decrementing it
at the end for each thread. When the counter reaches zero, the thread
that set it to zero sets an event that allows the original thread to
continue.
What a coincidence, I wrote something very much like this today to solve a
similar problem -- I needed to have N things running in semi-parallel with a
main thread continuing when all of those things were done. I used the thread
pool and a monitor rather than an event, but it's the same principle. In
fact, I initially made the same error as the OP with my anonymous
delegate... The corrected version looks like this:

int itemsRemaining = fooCollection.Count;
object countMonitor = new object();
foreach (Foo foo in fooCollection) {
ThreadPool.QueueUserWorkItem(
delegate(object state) {
((Foo) state).Bar();
lock (countMonitor) {
--itemsRemaining;
if (itemsRemaining == 0) Monitor.Pulse(countMonitor);
}
},
foo
);
}

lock (countMonitor) {
if (itemsRemaining != 0) Monitor.Wait(countMonitor);
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top