Multipost (sorry): .NET blocking, thread pools, and other stuff

D

David Sworder

This message was already cross-posted to C# and ADO.NET, but I forgot to
post to this "general" group... sorry about that. It just occured to me
after my first post that the "general" group readers might have some
thoughts on this perplexing .NET blocking issue.

(see below)

=====
Hi,

I'm developing an application that will support several thousand
simultaneous connections on the server-side. I'm trying to maximize
throughput. The client (WinForms) and server communicate via a socket
connection (no remoting, no ASP.NET). The client sends a message to the
server that contains some instructions and the server responds in an
asynchronous fashion. In other words, the client doesn't block while waiting
for the request. The request might be processed on the server in a split
second or it might take a few minutes. In any case, the client doesn't wait
around for the response. It happily goes about servicing the user and when
the server gets around to responding, the client makes that response data
available to the user.

Now let's look at what's happening on the server. The server basically
receives a message from the client on an I/O thread-pool thread and does as
much as it can to process that request. Sometimes the server logic realizes
that it must talk to another server to process the request. In that case,
the thread-pool thread on the server does NOT block. It simply sends it's
request asynchronously to another server and then the thread returns to the
pool. When the secondary server does its thing, it'll notify the main server
via an existing socket connection, an I/O thread from the .NET thread pool
is assigned the task of reading this information, and this thread eventually
sends the results back to the client. Again, my desire is to eliminate
thread-blocking on the server whenever possible since thread-pool threads
are a limited resource (25/CPU, I believe). Make sense so far? ok...

...but now I'm in a bit of a pickle [not literally, that's just an
expression]. I'm now required to have the server respond to a certain type
of client request that requires that a SQL Server database be contacted. The
request might call for a SELECT statement or INSERT/UPDATE/DELETE or for a
stored proc to be executed. So I thought "no problem" -- I'll just use
ADO.NET. The problem though is that ADO.NET is synchronous and "blocking" by
nature. There is no "BeginFill()" for example to asynchronously fill a
DataSet. What are the implications of this? Well, for starters, this means
that whenever my server is waiting for a reply from ADO.NET, the thread-pool
thread that is making the call just blocks. This might not sound like such a
big deal but on a two CPU machine that has 50 threads in the pool, I could
easily encounter a situation where all 50 threads are blocked waiting for
ADO.NET responses -- which means that there are no threads left to process
incoming requests. Even *simple* requests that don't require database access
must sit grudgingly in the queue because all of the thread-pool threads are
sitting idly in a blocked state waiting for ADO.NET calls to return! Not
good!

Now you might say, "Listen, brother...Just use delegates and
BeginInvoke() to simulate asynchronous behavior for your database calls."
This won't work because of course BeginInvoke() simply will place the call
in the thread pool queue and then make the call synchronously. You might
say, "My friend... Use the .NET API command that increases the number of
threads in the pool." This doesn't solve the problem though and it's hackish
and it implies that I know better than MSFT how many threads should be in
the pool [which I don't].

What other approach might I take? Perhaps I should create a bunch of
threads manually and use these for database access? It's ugly, but at least
I'm not wasting my valuable thread pool threads which are used for
processing of incoming requests.

I'm going to give MSFT the benefit of the doubt that they made ADO.NET a
synchronous blocking animal for good reason... and they're right, I suppose,
because 99% of the time it makes sense to make ADO.NET calls in a
synchronous fashion. But in my case, hopefully you'll see my plight and have
a clever workaround for me!

Peace be with you!

David
 
D

David Sworder

Thanks for the thoughts, Alex. See follow ups below.
Yes, I would put blocking calls into threads.
But - No, I wouldn't worry about blocking nature of ADO calls at all.
Because Begin calls are also using threads.

Are you sure? Let's think about sockets, for example. If I do a
BeginReceive() against a socket and no data is received for 20 seconds, .NET
doesn't have a thread-pool thread sitting there in a blocked state for 20
seconds. (correct?) My understanding was that .NET waits for the TCP/IP
stack to say "hey, i've got some incoming data" at which point it pulls an
I/O thread from the thread pool to notify the application of the incoming
data.
Now, when calling BeginInvoke() against a delegate, things work exactly
as you've described. The thread pool just executes the delegate method
synchronously on a thread-pool thread -- and if this code blocks, then so be
it.
whatever scheme you will choose by db server. For end user it doesn't matter
if "Please wait" is related to long-running query or absence of free
threads to process request. Finally, when all possible connections to DB are
established and busy what can you do with next request?

Ah, I'm glad you asked! If a request comes that requires an ADO.NET call
and all possible connections to the DB are established, then certainly that
request has to wait. No argument there... BUT.. what if the request doesn't
require database access? What if the request simply wants the server to
return the current time and date? That doesn't require database access, yet
since all of my thread-pool threads are blocking waiting on ADO.NET the
simple request must be queued instead of being immediately processed.
Btw, when you don't block client request you are in danger. Suppose your
request takes a long time. Impatient user might issue it once again and
again - I've seen such cases. Are you sure you want to process 100s of same
requests when user wants to see results of only one? It's good way to
exhaust all available resources and clog all the communication lines with
duplicate information. Perfect DOS attack.

Yes! Very good point. Because of the nature of this app, the user can't
submit the same request twice (prohibited by the WinForms logic) -- but
you've made an excellent case for making my server code a bit more robust so
that it doesn't get overwhelmed with excess requests by a malicious person.
Maybe I need to limit the number of concurrent requests per client IP
address. I'll have to give that some thought.

Thanks again for the response,

David
 
A

AlexS

Yes, I would put blocking calls into threads.
But - No, I wouldn't worry about blocking nature of ADO calls at all.
Because Begin calls are also using threads.

After you will use all available resources - 25 or 25000 - for threads, you
will need to say to next requesting user "Please wait" when request is
submitted. And same Please wait you will have to show to any requestor when
you can't guarantee immediate answer. So, simple scheme could be based just
on queue of requests or messages, which will be processed in FIFO or
whatever scheme you will choose by db server. For end user it doesn't matter
if "Please wait" is related to long-running query or absence of free
threads to process request. Finally, when all possible connections to DB are
established and busy what can you do with next request?

Simple picture: standard msde doesn't allow more than 5 simultaneous
connections. Does it make any sense to have more than 5 threads for db
access? 6th one will have to wait until one of connected ones finishes. So,
what for to expend resources on 6th one? And take note what msde says when
you try to connect more than 5 sessions.

Btw, when you don't block client request you are in danger. Suppose your
request takes a long time. Impatient user might issue it once again and
again - I've seen such cases. Are you sure you want to process 100s of same
requests when user wants to see results of only one? It's good way to
exhaust all available resources and clog all the communication lines with
duplicate information. Perfect DOS attack.

Try to keep it simple...And expect unexpected ;-)

HTH
Alex

David Sworder said:
This message was already cross-posted to C# and ADO.NET, but I forgot to
post to this "general" group... sorry about that. It just occured to me
after my first post that the "general" group readers might have some
thoughts on this perplexing .NET blocking issue.

(see below)

=====
Hi,

I'm developing an application that will support several thousand
simultaneous connections on the server-side. I'm trying to maximize
throughput. The client (WinForms) and server communicate via a socket
connection (no remoting, no ASP.NET). The client sends a message to the
server that contains some instructions and the server responds in an
asynchronous fashion. In other words, the client doesn't block while waiting
for the request. The request might be processed on the server in a split
second or it might take a few minutes. In any case, the client doesn't wait
around for the response. It happily goes about servicing the user and when
the server gets around to responding, the client makes that response data
available to the user.

Now let's look at what's happening on the server. The server basically
receives a message from the client on an I/O thread-pool thread and does as
much as it can to process that request. Sometimes the server logic realizes
that it must talk to another server to process the request. In that case,
the thread-pool thread on the server does NOT block. It simply sends it's
request asynchronously to another server and then the thread returns to the
pool. When the secondary server does its thing, it'll notify the main server
via an existing socket connection, an I/O thread from the .NET thread pool
is assigned the task of reading this information, and this thread eventually
sends the results back to the client. Again, my desire is to eliminate
thread-blocking on the server whenever possible since thread-pool threads
are a limited resource (25/CPU, I believe). Make sense so far? ok...

...but now I'm in a bit of a pickle [not literally, that's just an
expression]. I'm now required to have the server respond to a certain type
of client request that requires that a SQL Server database be contacted. The
request might call for a SELECT statement or INSERT/UPDATE/DELETE or for a
stored proc to be executed. So I thought "no problem" -- I'll just use
ADO.NET. The problem though is that ADO.NET is synchronous and "blocking" by
nature. There is no "BeginFill()" for example to asynchronously fill a
DataSet. What are the implications of this? Well, for starters, this means
that whenever my server is waiting for a reply from ADO.NET, the thread-pool
thread that is making the call just blocks. This might not sound like such a
big deal but on a two CPU machine that has 50 threads in the pool, I could
easily encounter a situation where all 50 threads are blocked waiting for
ADO.NET responses -- which means that there are no threads left to process
incoming requests. Even *simple* requests that don't require database access
must sit grudgingly in the queue because all of the thread-pool threads are
sitting idly in a blocked state waiting for ADO.NET calls to return! Not
good!

Now you might say, "Listen, brother...Just use delegates and
BeginInvoke() to simulate asynchronous behavior for your database calls."
This won't work because of course BeginInvoke() simply will place the call
in the thread pool queue and then make the call synchronously. You might
say, "My friend... Use the .NET API command that increases the number of
threads in the pool." This doesn't solve the problem though and it's hackish
and it implies that I know better than MSFT how many threads should be in
the pool [which I don't].

What other approach might I take? Perhaps I should create a bunch of
threads manually and use these for database access? It's ugly, but at least
I'm not wasting my valuable thread pool threads which are used for
processing of incoming requests.

I'm going to give MSFT the benefit of the doubt that they made ADO.NET a
synchronous blocking animal for good reason... and they're right, I suppose,
because 99% of the time it makes sense to make ADO.NET calls in a
synchronous fashion. But in my case, hopefully you'll see my plight and have
a clever workaround for me!

Peace be with you!

David
 
A

AlexS

David,
see in text

David Sworder said:
Thanks for the thoughts, Alex. See follow ups below.


Are you sure? Let's think about sockets, for example. If I do a
BeginReceive() against a socket and no data is received for 20 seconds, ..NET
doesn't have a thread-pool thread sitting there in a blocked state for 20
seconds. (correct?) My understanding was that .NET waits for the TCP/IP
stack to say "hey, i've got some incoming data" at which point it pulls an
I/O thread from the thread pool to notify the application of the incoming
data.

And this is done by some thread in system. IP stack or whatever server you
have servicing these calls. Check ReceiveCompleted event. It's same story as
with user-defined delegates. So, there is no real difference except maybe in
implementation details like polling versus interrupt listening. Even if it
is not your thread you use it during waiting period. For BeginReceive you
either attach to ReceiveCompleted, either specify AsyncCallback in your
code.

Btw, I would appreciate asycnhronous Begin calls for ADO and RegEx too.
whatever scheme you will choose by db server. For end user it doesn't
matter

Ah, I'm glad you asked! If a request comes that requires an ADO.NET call
and all possible connections to the DB are established, then certainly that
request has to wait. No argument there... BUT.. what if the request doesn't
require database access? What if the request simply wants the server to
return the current time and date? That doesn't require database access, yet
since all of my thread-pool threads are blocking waiting on ADO.NET the
simple request must be queued instead of being immediately processed.

You might implement some dispatcher, which will route requests to different
pools of threads - for example, it could be immediate threads pool and long
threads pool. However, this really doesn't make any difference. You might be
overflooded with immediate requests too - up to the number of possible
connections for your server. So, it's standard limited resources problem..
You always will have somewhere some bottleneck - in IP stack (max
connections), db or your server. As soon as you have limited resources you
have to queue requests and choose prioritization scheme - FIFO, LIFO,
whatever. As soon as queue is filled up you have to reject or save
subsequent requests. Or scale up your servers and invest into load
balancing, distributed databases, clusters, additional communication lines
etc. Is it worth it? You decide.

Anyway IMO it's not worth the effort to invent complex pooling schemes,
which could be easily brought down by sheer volume of requests. That's why I
suggest the simplest possible implementation. At least it will be easier to
analyze when problems will occur.

Btw, your remark implies that you prioritize requests. Simple server-only
have highest priority just because they don't require db access. Are you
sure that's what you want?

Possibly would be good to take a look at the distributed systems theory -
massive concurrency, massive data processing, proofing communication
protocols etc. For example at http://vl.fmnet.info/concurrent/.
Client-server and .Net threading are just a particular cases of general
problem.
HTH
Alex
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top