runaway thread count and asynchronous sockets

M

Matthew Groch

Hi all,

I've got a server that handles a relatively high number of concurrent
transactions (on the magnitude of 1000's per second). Client
applications establish socket connections with the server. Data is
sent and received over these connections using the asynchronous model.

The server is currently in beta testing. Sporadically over the course
of the day, I'll observe the thread count on the process (via perfmon)
start climbing. (Let's say that on average, a 'normal' thread count is
around 80). But when it climbs, well.. Today, for example, I saw it
peak at around 355 threads.

Now, I don't explicitly spawn these threads in my code. I'm assuming
they're being spawned by the CLR to handle the high volume of async
socket sends. The count stays high for some (arbitrary?) period of
time and then, automagically, comes back down to normal levels again.
When the thread count is high, the system is obviously taxed and
performance goes down. I've written code specifically so that I keep a
flag associated with a socket session such that a session will not
initiate an async send if one is currently in progress, if that's
worth mentioning at all.

I'm hoping to hear from _anyone_ that might have encountered a similar
situation!! I'm not exactly sure what to do at this point. I'd love
any help and am particularly looking for responses like "Hey, have you
thought about this?" or "Hmm, maybe you're misunderstanding some
fundamental concept in async socket comm", or best, "Yeah, that sounds
similar to something we ran into last year; here's the strategy we
used to get around it.."

Thanks in advance-
 
G

Guest

FYI - I have a very similiar scenario to yours; I'm running a C# service on
a 2 HT Zeon processor box. I have a bunch of custom perfmon counters in code
and I know that ThreadPool worker thread counts stay right around 100 for me;
I dunno what IO completion thread counts are as I haven't had any issues with
them. Be aware that the system often needs a 'shadow thread' for every
thread you spin off so high thread counts in and of themselves are not always
an issue...
Data is
sent and received over these connections using the asynchronous model.

How exactly are you using asynch? Fire and forget or BeginReceive and
sleep? In my case my service receives 3 seperate pieces of data serially
before acting on the data, so I do 3 BeginReceive/sleep's in a row with my IO
completion handler setting the IO completion event each time. In each case I
sleep on IO completion and/or a service command event. This is working very
well for me.

Also be aware that your IO completion handler should catch BOTH
SocketException and generic Exception --> if the socket is closed by a worker
thread within your code {aka if you shutdown your program or service and some
other worker thread closes the socket} the IO completion callback will fire
with a generic Exception and a message that reads "socket disposed" aka:

// Called by ThreadPool IO completion thread
void ReceiveIOCompletionCallback(IAsyncResult ar)
{
try
{
// wait for IO to complete then signal BeginReceive caller thread
((Socket)ar.AsyncState).EndReceive(ar);
((ManualResetEvent)ar.AsyncWaitHandle).Set();
}
catch (SocketException ex)
{
// get here if remote client closes socket or other expected error...
}
catch (Exception ex)
{
// get here if another of your threads disposes socket
}
}
'normal' thread count is
around 80). But when it climbs, well.. Today, for example, I saw it
peak at around 355 threads.
Now, I don't explicitly spawn these threads in my code. I'm assuming
they're being spawned by the CLR to handle the high volume of async
socket sends. The count stays high for some (arbitrary?) period of
time and then, automagically, comes back down to normal levels again.

If you are seeing runaway thread counts it could be several things. Are
they IO completion threads or are they worker threads?

If they are IO completion threads then you might be ok. In the IO
completion callback code above an IO completion thread is active and it
counts against the process as an active thread - the thread sleeps until the
IO completes but it will count as an active thread. Your high thread counts
might just mean that you have many IO requests pending and that there are
many IO completion threads sleeping while IO is in progress.

-OR- You could be getting a socket error or partial IO completion and many
IO completion threads are active to service the completed operations. If you
are doing partial read/write operations then you could see high thread counts
if operations are partially completing. In my stuff I don't allow partial IO
completion - I begin a write/read operation and sleep until the IO fully
completes.

On the other hand if your high thread counts are due to your worker threads
then invariably you have an issue with the logic related to spinning off a
thread - some condition is triggering your logic to spin off many threads -
you'll probably want to fix that...
The count stays high for some (arbitrary?) period of
time and then, automagically, comes back down to normal levels again.
When the thread count is high, the system is obviously taxed and
performance goes down.

How long until the automagic brings thread counts back down? A few seconds
or something along the lines of a socket timeout?
I've written code specifically so that I keep a
flag associated with a socket session such that a session will not
initiate an async send if one is currently in progress, if that's
worth mentioning at all.

Why only one at a time? One great advantage of asynch IO is having many
things happening at the same time. You can queue up several reads and
writes, they are guaranteed to be serviced in the order in which they are
queued...

In my stuff I have coded a 'waitable semaphore' as a throttling mechanism.
All socket activity is throttled by the configurable maximum number of
threads the semaphore will permit. At different points in code threads wait
for IO completion, a semaphore reference, and/or a service command or any
arbitrary combination of those events...

--Richard
 
M

Matthew Groch

Richard said:
FYI - I have a very similiar scenario to yours; I'm running a C# service on
a 2 HT Zeon processor box. I have a bunch of custom perfmon counters in code
and I know that ThreadPool worker thread counts stay right around 100 for me;
I dunno what IO completion thread counts are as I haven't had any issues with
them. Be aware that the system often needs a 'shadow thread' for every
thread you spin off so high thread counts in and of themselves are not always
an issue...


How exactly are you using asynch? Fire and forget or BeginReceive and
sleep? In my case my service receives 3 seperate pieces of data serially
before acting on the data, so I do 3 BeginReceive/sleep's in a row with my IO
completion handler setting the IO completion event each time. In each case I
sleep on IO completion and/or a service command event. This is working very
well for me.

Well, I guess it's more or less 'fire and forget' in that I initiate a
BeginReceive/BeginSend and then have functions to handle the
completion of the async call (i.e. via EndReceive/EndSend).
Also be aware that your IO completion handler should catch BOTH
SocketException and generic Exception --> if the socket is closed by a worker
thread within your code {aka if you shutdown your program or service and some
other worker thread closes the socket} the IO completion callback will fire
with a generic Exception and a message that reads "socket disposed" aka:

// Called by ThreadPool IO completion thread
void ReceiveIOCompletionCallback(IAsyncResult ar)
{
try
{
// wait for IO to complete then signal BeginReceive caller thread
((Socket)ar.AsyncState).EndReceive(ar);
((ManualResetEvent)ar.AsyncWaitHandle).Set();
}
catch (SocketException ex)
{
// get here if remote client closes socket or other expected error...
}
catch (Exception ex)
{
// get here if another of your threads disposes socket
}
}

Check. I've got those catches in my code.
If you are seeing runaway thread counts it could be several things. Are
they IO completion threads or are they worker threads?

I'm assuming IO completion threads. I'm not using the ThreadPool
explicitly. I spawn a very well-defined number of Thread objects
explicitly for very specific work that goes on in the background
throughout the lifetime of the process. For all async socket comm, I
let the CLR do its magic...
If they are IO completion threads then you might be ok. In the IO
completion callback code above an IO completion thread is active and it
counts against the process as an active thread - the thread sleeps until the
IO completes but it will count as an active thread. Your high thread counts
might just mean that you have many IO requests pending and that there are
many IO completion threads sleeping while IO is in progress.

-OR- You could be getting a socket error or partial IO completion and many
IO completion threads are active to service the completed operations. If you
are doing partial read/write operations then you could see high thread counts
if operations are partially completing. In my stuff I don't allow partial IO
completion - I begin a write/read operation and sleep until the IO fully
completes.

On the other hand if your high thread counts are due to your worker threads
then invariably you have an issue with the logic related to spinning off a
thread - some condition is triggering your logic to spin off many threads -
you'll probably want to fix that...

Yeah, no worker threads (i.e. via ThreadPool) in my code, so I'm
pretty sure it's something in the IOCP arena...
How long until the automagic brings thread counts back down? A few seconds
or something along the lines of a socket timeout?

When it's happened, the duration has been inconsistent, but certainly
significantly longer than a socket timeout (i.e. > 5-10 min sometimes)
Why only one at a time? One great advantage of asynch IO is having many
things happening at the same time. You can queue up several reads and
writes, they are guaranteed to be serviced in the order in which they are
queued...

In my stuff I have coded a 'waitable semaphore' as a throttling mechanism.
All socket activity is throttled by the configurable maximum number of
threads the semaphore will permit. At different points in code threads wait
for IO completion, a semaphore reference, and/or a service command or any
arbitrary combination of those events...

--Richard

Well, I think this implementation is sort of analogous to the code
snippet you've provided above. Basically, I have a thread that goes
through a queue and pulls out data that needs to be forwarded, via a
socket connection, to one of many connected clients. Thread takes the
data and initiates an async send and then goes back to the top of its
loop without waiting for the send to complete (thus, async). If the
thread pulls data off the queue that needs to be sent to a client that
is currently in the process of receiving data (asynchronously), it
defers that send. This is done to try and avoid the 'pile-up'
situation with a huge number of sends pending... (and the ensuing high
thread count associated with it, context switching costs, etc..)

To note, the 'processing' threads that I explicitly spawn are assigned
a Highest priority (this is due to the real-time nature of the
server.. lags/delays are viewed by users very poorly). So with that
being the case, I've added Thread.Sleep(0) directives at the end of
each of my explicitly-spawned Thread's loops and am testing the impact
of that today in the beta environment. The reason I added this is
because I've theorized that my high-priority threads are starving the
IOCP threads (which, I'm assuming, are of a 'normal' priority) and
barring them from invoking their callbacks and, in high traffic
periods, this results in the thread count build up I've observed...

I went out on the newsgroups and there seemed to be some confusion as
to whether Thread.Sleep(0) would have any impact if the statement were
executed by a thread with a high priority vis-a-vis other
lower-priority threads- in that because other threads had a lower
priority, the Sleep(0) invocation would do nothing at all and the
high-priority thread would keep on running. I personally don't know
the definitive answer to this, yet.. continuing to investigate. I also
changed the base priority on the process itself (to High). Not sure of
the _exact_ implications of this on thread scheduling, etc...

Anyway, thanks for the comments and suggestions. I'll see how today's
server run goes and post a follow-up. If you have any clarification on
the whole thread prioritization/scheduling issue, that would be much
appreciated as well!
 
M

Matthew Groch

Ok, here's a follow-up for any that might find interest in this
thread:

The Thread.Sleep(0) directives did not improve the situation with the
runaway thread count problem. Then, I decided to try another
experiment. Very simply put, I made _all_ of the threads that I
explicitly spawn in my server a normal priority. This, it seems so
far, appears to have addressed the situation.

My theory is that because I had high-priority threads always running
in the background, they effectively starved the normal-priority
threads spawned by the CLR to handle the async socket communication.
These threads pretty much backed up on themselves while waiting to be
serviced by the system when traffic on the server breached certain
thresholds.


(e-mail address removed) (Matthew Groch) wrote in message
 
W

Willy Denoyette [MVP]

You should take care with High priority threads as they will disturb the
functioning of the finalizer thread and indirectly the Garbage collector.

Willy.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top