context switch

Ryan Liu · Jun 2, 2008

Hi,

I have a client/server application, using one thread/client approach.

I see very high context switch/sec. What are the ways to reduce it? Each
thread sleep longer in its endless loop if not reducing thread?

On the same machine, I also run a mysql server. I can see same amount thread
in mysqld, seems mysql is also using one thread/client. But its
context/swith is much lower. How can I get same performance?

Thanks a lot!
Ryan

Nicholas Paldino [.NET/C# MVP] · Jun 2, 2008

Ryan,

Generally, you are going to have context switches when you have more
threads that are running than there are processors to handle the threads.

How are you allocating the threads? Generally, you should probably use
the ThreadPool to assign tasks to your threads, as the ThreadPool will take
into account the number of processors on the machine to determine how many
threads to keep in the pool (and minimize context switches).

Also, are you sure the context switches are causing a performance
impact? If you switch to less threads, how do you kno you will get the same
concurrency that you have now?

Ryan Liu · Jun 2, 2008

Nicholas , Thanks for your quick response.

I have 120 client machines connect to one server for the C/S application.
MySql runs on the same server machine. Each client keep one connection to
the db server as well. Clients and the server exchange very small data, but
very frequently.

I am using sync I/O to read from client, and use
ThreadPool.QueueUserWorkItem to send response back.

I haven't use ThreadPool to read data from client, because 1: my current
approach is easy to write code, and 2: I am afraid if I handle to ThreadPool
and since server work load is high, will my receiving job fail to be queued
or wait in queue for too long time? Is ThreadPool suitable for C/S
application that constantly scan and trying to receive data from clients?

I think maybe I wait longer in each loop or use one thread for 10 clients.
In each loop, scan 10 clients one by one, and sleep less in each loop. So
for each client, interval maybe still same. Will that help a lot?

Now it sleep 100 ms in each loop. As said, I have 120 loops running.
Generally, is 100 ms a suitable amount?

Thanks!

Nicholas Paldino said:
Ryan,

Generally, you are going to have context switches when you have more
threads that are running than there are processors to handle the threads.

How are you allocating the threads? Generally, you should probably use
the ThreadPool to assign tasks to your threads, as the ThreadPool will
take into account the number of processors on the machine to determine how
many threads to keep in the pool (and minimize context switches).

Also, are you sure the context switches are causing a performance
impact? If you switch to less threads, how do you kno you will get the
same concurrency that you have now?

--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

Ryan Liu said:

Hi,

I have a client/server application, using one thread/client approach.

I see very high context switch/sec. What are the ways to reduce it? Each
thread sleep longer in its endless loop if not reducing thread?

On the same machine, I also run a mysql server. I can see same amount
thread in mysqld, seems mysql is also using one thread/client. But its
context/swith is much lower. How can I get same performance?

Thanks a lot!
Ryan

Click to expand...

Nicholas Paldino [.NET/C# MVP] · Jun 2, 2008

Peter,

Well, the statement about the thread pool is true, in the sense that it
takes the number of threads into account when figuring out how many threads
it should keep in the pool. I didn't say that it kept a number of threads
equal to the number of processors =)

The new parallel task library is better at this, I believe (the new
System.Threading.dll).

Ryan Liu · Jun 2, 2008

Peter Duniho said:
The simple answer is: discard the "one thread/client" design. It doesn't
scale well on Windows.

Use the asynchronous API on the i/o classes (especially network i/o, which
I assume is what you're talking about), and that will take advantage of
the built-in mechanisms in Windows designed to prevent excessive context
switching and minimizing thread count (i/o completion ports). For
example, use Socket.BeginReceive, Stream.BeginRead, etc. (with matching
"End..." methods, of course).

I'm surprised to hear that MySql uses "one thread/client", and I'm curious
how you've confirmed this. Regardless, it's not possible to answer "how
can I get the same performance" without knowing the specific
implementation details for MySql, and of course knowing the specific
implementation details would tell you directly how to get the same
performance.

Finally, of course, you should ask yourself whether it's important. If
you only have a small number of clients and performance is acceptable,
then a high number of context switches may not be a problem. Yes, it
might mean the design could be more efficient, but if the code works and
performs up to your needs, complicating the design may not be the best
idea.

Pete

Thanks Pete!

I used asynchronous API before, but I was told by my customer, the server
stops itself after run for 15 minutes.

Instead write a while(!Stop) in each thread, when I use async approach, I
call BeginRead() again in each call back method of EndRead, to create an
endless loop.

Maybe wrong idea: I also worry will asynchronous read/write make the server
not response in timely manner?

I have 120 client machines connect to one server for the C/S application.
MySql runs on the same server machine. Each client keep one connection to
the db server as well. Clients and the server exchange very small data, but
very frequently.

I use mysql command "show full processlist", and I can see about 120
threads. But about 119 are sleeping.

The applications runs well for most of time, but once a while, server gets
very busy and slow.

Thanks again!

Nicholas Paldino [.NET/C# MVP] · Jun 2, 2008

Ryan,

Sleeping at all isn't a good idea. I'd eliminate that code immediately.

I don't know why you are using QueueUserWork item to send the response
back. I mean, if it takes a good amount of time, then I can understand
breaking it down into smaller parts, but generally, I would write the
response back after you have processed the request in the callback from
reading the request using asynchronous IO.

Your needs for reducing context switches and maintaining the
responsiveness of the server are at odds with each other. If you want to
reduce the context switches, then you have to reduce the amount of
concurrent processing occuring on the machine. In doing that, you are going
to reduce the throughput, which is going to ultimately affect
responsiveness.

--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

Ryan Liu said:
Nicholas , Thanks for your quick response.

I have 120 client machines connect to one server for the C/S application.
MySql runs on the same server machine. Each client keep one connection to
the db server as well. Clients and the server exchange very small data,
but very frequently.

I am using sync I/O to read from client, and use
ThreadPool.QueueUserWorkItem to send response back.

I haven't use ThreadPool to read data from client, because 1: my current
approach is easy to write code, and 2: I am afraid if I handle to
ThreadPool and since server work load is high, will my receiving job fail
to be queued or wait in queue for too long time? Is ThreadPool suitable
for C/S application that constantly scan and trying to receive data from
clients?

I think maybe I wait longer in each loop or use one thread for 10 clients.
In each loop, scan 10 clients one by one, and sleep less in each loop. So
for each client, interval maybe still same. Will that help a lot?

Now it sleep 100 ms in each loop. As said, I have 120 loops running.
Generally, is 100 ms a suitable amount?

Thanks!

Nicholas Paldino said:

Ryan,

Generally, you are going to have context switches when you have more
threads that are running than there are processors to handle the threads.

How are you allocating the threads? Generally, you should probably
use the ThreadPool to assign tasks to your threads, as the ThreadPool
will take into account the number of processors on the machine to
determine how many threads to keep in the pool (and minimize context
switches).

Also, are you sure the context switches are causing a performance
impact? If you switch to less threads, how do you kno you will get the
same concurrency that you have now?

--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

Ryan Liu said:

Hi,

I have a client/server application, using one thread/client approach.

I see very high context switch/sec. What are the ways to reduce it? Each
thread sleep longer in its endless loop if not reducing thread?

On the same machine, I also run a mysql server. I can see same amount
thread in mysqld, seems mysql is also using one thread/client. But its
context/swith is much lower. How can I get same performance?

Thanks a lot!
Ryan

Click to expand...

Click to expand...

Ryan Liu · Jun 2, 2008

More info:

I use performance counter fount that:

Threads used by mysql server on the same machine, context/switches is abut
5-15 per second.
Mine thread, very silly, is about 43,384,322 - 4,295,039,822

And as for I/O data bytes/sec , mysql is 350,000 to 439,951. My server
application is 0.(Suprise, I need check this again.)

And my application % process time is lower than mysql as well at the moment
I watch.

And whole syste ave disk queue length is vary greatly, from 1.5 to 15.845.

And system-threads average is 881,049, max is 29,934,599. So high! This is
read from Control Panel - Administrator Tool - Performance . I think in task
manager, total threads shows there should be less than 1000.

Thanks!

Rene · Jun 2, 2008

Hey Nicholas, you are back!!

Where were you hiding? Did you take a looooooooong vacation?

It’s great to have you back!

Nicholas Paldino said:
Ryan,

Generally, you are going to have context switches when you have more
threads that are running than there are processors to handle the threads.

How are you allocating the threads? Generally, you should probably use
the ThreadPool to assign tasks to your threads, as the ThreadPool will
take into account the number of processors on the machine to determine how
many threads to keep in the pool (and minimize context switches).

Also, are you sure the context switches are causing a performance
impact? If you switch to less threads, how do you kno you will get the
same concurrency that you have now?

--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

Ryan Liu said:

Hi,

I have a client/server application, using one thread/client approach.

I see very high context switch/sec. What are the ways to reduce it? Each
thread sleep longer in its endless loop if not reducing thread?

On the same machine, I also run a mysql server. I can see same amount
thread in mysqld, seems mysql is also using one thread/client. But its
context/swith is much lower. How can I get same performance?

Thanks a lot!
Ryan

Click to expand...

Ryan Liu · Jun 2, 2008

Peter Duniho said:
[...]
I used asynchronous API before, but I was told by my customer, the
server
stops itself after run for 15 minutes.

Click to expand...

I don't know that that means. Were you instructed by your customer to
make the server stop after 15 minutes of execution? Or did your customer
complain to you that the server would only run after 15 minutes of
execution? In either case, how is that relevant to the current topic.

The customer complains to me. So I changed back to one thread per client
with sync read/write approach. And have't dare try async again.
Oh, virus were found before on the customer's network and maybe still there.
The customer says that virus has no harm and they have it for years. They
will kill it later but now my application must live with it for months.
:-( My customer is a sub sub sidrary of a forturn 500.

Correct. You can abort a BeginRead() that hasn't completed yet by closing
the i/o object (e.g. call Stream.Close()). Then the i/o will complete
with an exception thrown when you call EndRead(). In that way you can
detect the closure of the object and do whatever cleanup you need to,
rather than calling BeginRead() again.

It should have the exact opposite effect. Using IOCP is the best way to
reduce CPU overhead and make the server run efficiently.

Well, the fact is that there are other things about your own
implementation that, at least without further clarification from you, seem
particularly troubling. It's certainly theoretically possible to manage
120 clients with a "one thread/client" implementation, but only if you do
it correctly.

If you are in fact not dedicating a single thread to a particular client's
read and write both, but rather (as it seems from the vague information
you've provided so far) polling for reads and queuing each write to the
thread pool individually, then that seems like that would in fact be one
of the worst ways to try to implement a "one thread/client" design.

Are you sure this is the worest way? Then I made a big mistake. I am doing
just that.
If you are pretty sure, then I will change the code -- either dedicated
thread or try async I/O again.

From what we've heard so far, that doesn't seem surprising to me. You
should at a minimum fix your design so that you have a dedicated thread
for each client (don't use the thread pool), where that dedicated thread
handles both reads and writes with the client. Preferably, you should
just go ahead and use the async API, to take advantage of the efficiency
that IOCP will get you.

Pete

Thank you very much, Pete!

Ryan Liu · Jun 2, 2008

Peter Duniho said:
How CPUs do you have? I have a hard time imagining how you could possibly
have 4 billion context switches in a second. Even 43 million seems
implausible if your polling thread (assuming you have one...you haven't
confirmed that yet) sleeps for 10ms each loop.

I have one dure core CPU, 1G memory. (Total 1.2G in use with virtual
memory). I need check customer what extract CPU it is. It runs on server
2003.
I thread 100 ms for each read loop. But use ThreadPool to write back to
clients.

What OS? On 32-bit Windows, I think the maximum number of threads,
assuming nothing else competing for resources, is something like 1000 per
process. Without an enormous number of processes, it's hard to see how
you could get up to 30 million threads (especially assuming that most
processes aren't themselves hosting an exorbitant number of threads).

On 64-bit Windows, my recollection is that the max is much higher of
course. In any case, I'd agree that even nearing 1 million threads seems
wrong. But again, how many CPUs is there on the hardware?

I reiterate: you really should move to the async API.

Pete

It is Windows 2003 server. And the process count in task manager is less
than 1000, as I remember. The unbeliever number is shown in Admin
Tool -Performance. Maybe numbers in performance counter are different.

Thanks!

Jon Skeet [C# MVP] · Jun 2, 2008

Can you elaborate? I wasn't aware there's new System.Threading stuff.
What did they add that helps?

It's not actually released yet, but there's a CTP available. I've been
trying it out with a few mini-projects which are embarrassingly
parallel - plotting the Mandelbrot set and Conway's Game of Life.
Parallel Extensions makes this really, really easy - it was pretty
much just a case of turning "for" into "Parallel.For" and rewriting
the body as a lambda expression. See my blog (last couple of posts)
for details - and search for "parallel extensions" to find a load more
stuff and the CTP.

From what I've seen, it's going to be very, very good. In terms of
context switch avoidance, it uses work stealing to allocate tasks, and
doesn't create more threads than cores (at least in the normal case -
it may do more if it can detect that cores are idle due to IO-
bottlenecked tasks).

Jon

Jon Skeet [C# MVP] · Jun 2, 2008

On Jun 2, 7:36 am, "Peter Duniho" <[email protected]>
wrote:

Figures. The one post where you neglect to include your .sig (with the
blog link), and it's the one that has me looking for your blog.

Oops. That's the difference between posting with Google Groups and
posting with Gravity. (Must investigate whether Google Groups has sig
functionality lurking somewhere.)

Anyway, thanks...sounds very cool. I'll check it out when I get a
chance. (Don't worry about the blog link...I can grab it from another
post).

Might as well include it now though

http://msmvps.com/jon.skeet

Jon

Ryan · Jun 2, 2008

Peter Duniho said:
The customer complains to me. So I changed back to one thread per client
with sync read/write approach. And have't dare try async again.
Oh, virus were found before on the customer's network and maybe still
there.

Click to expand...

Viruses on computers are a bad idea, no matter what. On the other
hand, it's possible you did something wrong in the async implementation
that caused the server to terminate prematurely. Or the virus could have
done it. One reason it's so bad to have a virus on the computer is that
when weird things happen, it's harder to know for sure it's your own code
that's causing the problem.

That said, if the virus were killing the server after 15 minutes with the
async implementation, I'd think it would with the current implementation
too. There's no way from here to know for sure, but it sounds like the
implementation itself might have had bugs.

[...]

If you are in fact not dedicating a single thread to a particular
client's
read and write both, but rather (as it seems from the vague information
you've provided so far) polling for reads and queuing each write to the
thread pool individually, then that seems like that would in fact be one
of the worst ways to try to implement a "one thread/client" design.

Click to expand...

Are you sure this is the worest way? Then I made a big mistake.
I am doing just that.

Click to expand...

Sorry to say, yes. Think about what you're doing. On the one hand, it's
nice that you're sleeping. On the other hand, if you are sleeping between
each read, then you are forcing a context switch between each read. Even
if you're only sleeping when you run out of reads, you're still assigning
a different thread to each write operation. Again, this forces a context
switch for each write operation.

Basically, with each i/o operation, you force a context switch.

If you are pretty sure, then I will change the code -- either dedicated
thread or try async I/O again.

Click to expand...

Please do. At the very minimum, take out the sleeps, and dedicate a
thread to each client. That way, the Windows thread scheduler can make
sure that any given thread will run as long as possible before a context
switch happens. Either the thread will run out of things to do, or it
will use up its timeslice when there's another thread ready to run.

Using the async API would be much better even still, but even switching to
a dedicated thread for each client would improve things a _lot_ as
compared to your current implementation.

[...]
Thank you very much, Pete!

Click to expand...

You're welcome. Happy to help if I can.

Pete

[Sorry, called out and now access news group from public computer]

Long time ago, I remember if I take out sleeps, then the server will be
very busy and slow. Back then, I only have 16 clients connect to the
server. Now I have 120.

Current code looks like this, old code for 16 clients should be similar:
while(true){
if(!stream.DataAvailable) {Thread.Sleep(200); continue};
int got = stream.Read(buffer, offset, size);
if(got ==0) {sleep(200);continue};}
//deal with data;
//then send response back using ThreadPool;
}

In the async code, I don't use while(true), just

BeginReadCallBack()
{

EndRead();
deal read msg;

BeginRead(,,, new AsyncCallBack(BeginReadCallBack)); //call function self
}
and no need sleep anywhere. Is this right way? I heard async coding is
complex, mine is so simple and make me doubt I write in wrong way.

Pete, From your response, I understand async I/O is best, dedicated
threads(120 threads) is better. But I am still not confident to take out
sleep from each dedicated thread. I am afraid it will keep server busy.

I thought if in each loop of each thread, if it sleeps longer, there will be
less context switch. While one thread sleeps, it can give other threads more
time to run, no need swith back to or the first thread. -- this is totally
wrong idea?

Thanks again!

Ryan Liu · Jun 2, 2008

[I always see weird things, when I reply, it does not add '> ' for each line
for this particular message. I had to write a code to add it :-)

]

"Peter Duniho" <[email protected]> Ð´ÈëÏûÏ¢ÐÂÎÅ[email protected]...

[...]
Long time ago, I remember if I take out sleeps, then the server will be
very busy and slow. Back then, I only have 16 clients connect to the
server. Now I have 120.

Current code looks like this, old code for 16 clients should be similar:
while(true){
if(!stream.DataAvailable) {Thread.Sleep(200); continue};
int got = stream.Read(buffer, offset, size);
if(got ==0) {sleep(200);continue};}
//deal with data;
//then send response back using ThreadPool;
}

Click to expand...

Right. This is polling, and this is awful.

Calling Thread.Sleep() helps avoid this particular thread from completely
consuming all available CPU time, and from starving other threads. But
the polling causes your context switching to actually increase, because if
you'd use a different i/o mechanism, that thread would be able to not run
at all until there was actually something to do. As it is, it has to wake
up every 200 ms just to see if there's data to be read, which causes a
context switch even when there's no data to be read.

Of course, the other bad thing that sleeping does is that it forces that
thread to wait as long as 200 ms before it can do any work, increasing
latency.

The one good thing I can say about that code is that at least you only
sleep when you believe you have nothing to do (though, a 0-byte read means
that the stream has closed, which you don't seem to deal with in the code
sample above). So at least the thread keeps the CPU as long as it can,
when it has work to do.

But otherwise, it's a terrible way to do i/o.

In the async code, I don't use while(true), just

BeginReadCallBack()
{

EndRead();
deal read msg;

BeginRead(,,, new AsyncCallBack(BeginReadCallBack)); //call function
self
}
and no need sleep anywhere. Is this right way? I heard async coding is
complex, mine is so simple and make me doubt I write in wrong way.

Click to expand...

I know. But it really is just that simple. It's one of the reasons I
like the async API so much. Asynchronous coding can be complex in the
sense that you have to mentally accomodate the possibility of
multi-threaded access to data structures. But since you're using
threading already, you already have this complexity in your code, but
without the inherent advantage of simplicity that the async API otherwise
provides.

Pete, From your response, I understand async I/O is best, dedicated
threads(120 threads) is better. But I am still not confident to take out
sleep from each dedicated thread. I am afraid it will keep server busy.

Click to expand...

No. The async API is essentially a blocking API. That's one of the
things that makes it so useful. It has the same advantage that the
regular blocking API has (that is, you aren't consuming any CPU resources
unless there's actual i/o work to do) but the same advantage that a
multi-threaded implementation has: you can easily handle a relatively
arbitrary number of clients with code that is essentially the same as if
you were dealing with a single client (that is, the bulk of the code looks
the same as if you only had to deal with one client instead of many).

With the async API, there is no "dedicated thread". A thread is assigned
as needed to each i/o operation that completes. But it's done in a much
more efficient way than your current way of dealing with writes. You are
queuing each write operation to the thread pool, which adds a lot of
overhead. But the async API has threads that are specifically for dealing
with i/o operations (in this sense they are "dedicated", but you're not
the one dedicating them ), and they just pull completed i/o operations
out of a queue as long as they exist.

If the queue is empty, there's nothing to do and Windows doesn't run any
of the threads. They simply block, consuming no CPU resources. If a
given thread already has the CPU, and there are i/o operation completions
in the queue, Windows will let that thread continue to run for as long as
is practical, rather than switching to a different thread.

Now, even if you dedicate a single thread to each client, you can avoid
the problems of the polling implementation you posted, by using the
blocking API without checking for data availability. Just call
Stream.Read(). It won't return until there's data available to read, and
the thread will not consume any CPU resources. Windows knows that the
thread can't do anything until the call to Read() completes, and won't
schedule that thread until then.

But using the async API you will avoid one problem that even the correct
"dedicated single thread/client" implementation would have: context
switches related to having multiple clients to service. With one thread
dedicated to each client, if you have i/o operations for multiple clients
that have completed, you will still be forced to have a context switch to
deal with each client, because each thread knows about only one client.
But using the async API, Windows is able to keep using the same thread to
handle i/o completions on multiple connections, avoiding any context
switches related to dealing with multiple clients.

I thought if in each loop of each thread, if it sleeps longer, there
will be
less context switch. While one thread sleeps, it can give other threads
more
time to run, no need swith back to or the first thread. -- this is
totally
wrong idea?

Click to expand...

Yes. It's not clear from your post whether that loop exists in a single
thread that manages multiple connections, or is being executed in multiple
threads, one thread per connection. But either way, the big problem with
the loop is that you are explicitly checking the DataAvailable property,
rather than just calling Read().

If you would just call Read(), then the thread would remain blocked until
there's data to be returned, and would not use any CPU time at all. But
as it is now, you only call Read() when you expect it to return right
away, which means that Windows has to keep scheduling that thread for
execution. Each thread is pretty much guaranteed to have to run 5 times a
second, even for a thread that's dealing with a connection that doesn't
have any i/o happening.

In this scenario, calling Thread.Sleep() is certainly better than not
calling Thread.Sleep(). But you shouldn't be in "this scenario" in the
first place. Polling -- that is, checking the DataAvailable property --
isn't useful, and it causes the performance issues you're seeing.

Personally, I like the async API and that's how I'd do this. But,
assuming that you already have a dedicated thread for each connection, you
may be able to fix things just by taking out the check for DataAvailable,
taking out the calls to Thread.Sleep(), and taking out the queuing of
write operations to the ThreadPool (just do the writes from the same
thread that is handling reads). In other words, most of what's wrong with
your current implementation may well just be that you have too much code.

Pete

Thanks you so much, Pete!

I will use async I/O, and do not create any thread myself for clients
(except main thread listen and accept income connect request), and give a
try. I will post back results.

Anything particulare I should pay attention about async I/O?

Ryan Liu · Jun 2, 2008

[I always see weird things, when I reply, it does not add '> ' for each line
for this particular message. I had to write a code to add it :-)

]

"Peter Duniho" <[email protected]> Ð´ÈëÏûÏ¢ÐÂÎÅ[email protected]...

[...]
Long time ago, I remember if I take out sleeps, then the server will be
very busy and slow. Back then, I only have 16 clients connect to the
server. Now I have 120.

Current code looks like this, old code for 16 clients should be similar:
while(true){
if(!stream.DataAvailable) {Thread.Sleep(200); continue};
int got = stream.Read(buffer, offset, size);
if(got ==0) {sleep(200);continue};}
//deal with data;
//then send response back using ThreadPool;
}

Click to expand...

Right. This is polling, and this is awful.

Calling Thread.Sleep() helps avoid this particular thread from completely
consuming all available CPU time, and from starving other threads. But
the polling causes your context switching to actually increase, because if
you'd use a different i/o mechanism, that thread would be able to not run
at all until there was actually something to do. As it is, it has to wake
up every 200 ms just to see if there's data to be read, which causes a
context switch even when there's no data to be read.

Of course, the other bad thing that sleeping does is that it forces that
thread to wait as long as 200 ms before it can do any work, increasing
latency.

The one good thing I can say about that code is that at least you only
sleep when you believe you have nothing to do (though, a 0-byte read means
that the stream has closed, which you don't seem to deal with in the code
sample above). So at least the thread keeps the CPU as long as it can,
when it has work to do.

But otherwise, it's a terrible way to do i/o.

In the async code, I don't use while(true), just

BeginReadCallBack()
{

EndRead();
deal read msg;

BeginRead(,,, new AsyncCallBack(BeginReadCallBack)); //call function
self
}
and no need sleep anywhere. Is this right way? I heard async coding is
complex, mine is so simple and make me doubt I write in wrong way.

Click to expand...

I know. But it really is just that simple. It's one of the reasons I
like the async API so much. Asynchronous coding can be complex in the
sense that you have to mentally accomodate the possibility of
multi-threaded access to data structures. But since you're using
threading already, you already have this complexity in your code, but
without the inherent advantage of simplicity that the async API otherwise
provides.

Pete, From your response, I understand async I/O is best, dedicated
threads(120 threads) is better. But I am still not confident to take out
sleep from each dedicated thread. I am afraid it will keep server busy.

Click to expand...

No. The async API is essentially a blocking API. That's one of the
things that makes it so useful. It has the same advantage that the
regular blocking API has (that is, you aren't consuming any CPU resources
unless there's actual i/o work to do) but the same advantage that a
multi-threaded implementation has: you can easily handle a relatively
arbitrary number of clients with code that is essentially the same as if
you were dealing with a single client (that is, the bulk of the code looks
the same as if you only had to deal with one client instead of many).

With the async API, there is no "dedicated thread". A thread is assigned
as needed to each i/o operation that completes. But it's done in a much
more efficient way than your current way of dealing with writes. You are
queuing each write operation to the thread pool, which adds a lot of
overhead. But the async API has threads that are specifically for dealing
with i/o operations (in this sense they are "dedicated", but you're not
the one dedicating them ), and they just pull completed i/o operations
out of a queue as long as they exist.

If the queue is empty, there's nothing to do and Windows doesn't run any
of the threads. They simply block, consuming no CPU resources. If a
given thread already has the CPU, and there are i/o operation completions
in the queue, Windows will let that thread continue to run for as long as
is practical, rather than switching to a different thread.

Now, even if you dedicate a single thread to each client, you can avoid
the problems of the polling implementation you posted, by using the
blocking API without checking for data availability. Just call
Stream.Read(). It won't return until there's data available to read, and
the thread will not consume any CPU resources. Windows knows that the
thread can't do anything until the call to Read() completes, and won't
schedule that thread until then.

But using the async API you will avoid one problem that even the correct
"dedicated single thread/client" implementation would have: context
switches related to having multiple clients to service. With one thread
dedicated to each client, if you have i/o operations for multiple clients
that have completed, you will still be forced to have a context switch to
deal with each client, because each thread knows about only one client.
But using the async API, Windows is able to keep using the same thread to
handle i/o completions on multiple connections, avoiding any context
switches related to dealing with multiple clients.

I thought if in each loop of each thread, if it sleeps longer, there
will be
less context switch. While one thread sleeps, it can give other threads
more
time to run, no need swith back to or the first thread. -- this is
totally
wrong idea?

Click to expand...

Yes. It's not clear from your post whether that loop exists in a single
thread that manages multiple connections, or is being executed in multiple
threads, one thread per connection. But either way, the big problem with
the loop is that you are explicitly checking the DataAvailable property,
rather than just calling Read().

If you would just call Read(), then the thread would remain blocked until
there's data to be returned, and would not use any CPU time at all. But
as it is now, you only call Read() when you expect it to return right
away, which means that Windows has to keep scheduling that thread for
execution. Each thread is pretty much guaranteed to have to run 5 times a
second, even for a thread that's dealing with a connection that doesn't
have any i/o happening.

In this scenario, calling Thread.Sleep() is certainly better than not
calling Thread.Sleep(). But you shouldn't be in "this scenario" in the
first place. Polling -- that is, checking the DataAvailable property --
isn't useful, and it causes the performance issues you're seeing.

Personally, I like the async API and that's how I'd do this. But,
assuming that you already have a dedicated thread for each connection, you
may be able to fix things just by taking out the check for DataAvailable,
taking out the calls to Thread.Sleep(), and taking out the queuing of
write operations to the ThreadPool (just do the writes from the same
thread that is handling reads). In other words, most of what's wrong with
your current implementation may well just be that you have too much code.

Pete

Thanks you so much, Pete! Your long response clarify lots of things for me.

I will use async I/O, and do not create any thread myself for clients
(except main thread listen and accept income connect request), and give a
try. I will post back results.

Anything particular I should pay attention about async I/O?

Ryan Liu · Jun 2, 2008

[I always see weird things, when I reply, it does not add '> ' for each line
for this particular message. I had to write a code to add it :-)

]

"Peter Duniho" <[email protected]> Ð´ÈëÏûÏ¢ÐÂÎÅ[email protected]...

[...]
Long time ago, I remember if I take out sleeps, then the server will be
very busy and slow. Back then, I only have 16 clients connect to the
server. Now I have 120.

Current code looks like this, old code for 16 clients should be similar:
while(true){
if(!stream.DataAvailable) {Thread.Sleep(200); continue};
int got = stream.Read(buffer, offset, size);
if(got ==0) {sleep(200);continue};}
//deal with data;
//then send response back using ThreadPool;
}

Click to expand...

Right. This is polling, and this is awful.

Calling Thread.Sleep() helps avoid this particular thread from completely
consuming all available CPU time, and from starving other threads. But
the polling causes your context switching to actually increase, because if
you'd use a different i/o mechanism, that thread would be able to not run
at all until there was actually something to do. As it is, it has to wake
up every 200 ms just to see if there's data to be read, which causes a
context switch even when there's no data to be read.

Of course, the other bad thing that sleeping does is that it forces that
thread to wait as long as 200 ms before it can do any work, increasing
latency.

The one good thing I can say about that code is that at least you only
sleep when you believe you have nothing to do (though, a 0-byte read means
that the stream has closed, which you don't seem to deal with in the code
sample above). So at least the thread keeps the CPU as long as it can,
when it has work to do.

But otherwise, it's a terrible way to do i/o.

In the async code, I don't use while(true), just

BeginReadCallBack()
{

EndRead();
deal read msg;

BeginRead(,,, new AsyncCallBack(BeginReadCallBack)); //call function
self
}
and no need sleep anywhere. Is this right way? I heard async coding is
complex, mine is so simple and make me doubt I write in wrong way.

Click to expand...

I know. But it really is just that simple. It's one of the reasons I
like the async API so much. Asynchronous coding can be complex in the
sense that you have to mentally accomodate the possibility of
multi-threaded access to data structures. But since you're using
threading already, you already have this complexity in your code, but
without the inherent advantage of simplicity that the async API otherwise
provides.

Pete, From your response, I understand async I/O is best, dedicated
threads(120 threads) is better. But I am still not confident to take out
sleep from each dedicated thread. I am afraid it will keep server busy.

Click to expand...

No. The async API is essentially a blocking API. That's one of the
things that makes it so useful. It has the same advantage that the
regular blocking API has (that is, you aren't consuming any CPU resources
unless there's actual i/o work to do) but the same advantage that a
multi-threaded implementation has: you can easily handle a relatively
arbitrary number of clients with code that is essentially the same as if
you were dealing with a single client (that is, the bulk of the code looks
the same as if you only had to deal with one client instead of many).

With the async API, there is no "dedicated thread". A thread is assigned
as needed to each i/o operation that completes. But it's done in a much
more efficient way than your current way of dealing with writes. You are
queuing each write operation to the thread pool, which adds a lot of
overhead. But the async API has threads that are specifically for dealing
with i/o operations (in this sense they are "dedicated", but you're not
the one dedicating them ), and they just pull completed i/o operations
out of a queue as long as they exist.

If the queue is empty, there's nothing to do and Windows doesn't run any
of the threads. They simply block, consuming no CPU resources. If a
given thread already has the CPU, and there are i/o operation completions
in the queue, Windows will let that thread continue to run for as long as
is practical, rather than switching to a different thread.

Now, even if you dedicate a single thread to each client, you can avoid
the problems of the polling implementation you posted, by using the
blocking API without checking for data availability. Just call
Stream.Read(). It won't return until there's data available to read, and
the thread will not consume any CPU resources. Windows knows that the
thread can't do anything until the call to Read() completes, and won't
schedule that thread until then.

But using the async API you will avoid one problem that even the correct
"dedicated single thread/client" implementation would have: context
switches related to having multiple clients to service. With one thread
dedicated to each client, if you have i/o operations for multiple clients
that have completed, you will still be forced to have a context switch to
deal with each client, because each thread knows about only one client.
But using the async API, Windows is able to keep using the same thread to
handle i/o completions on multiple connections, avoiding any context
switches related to dealing with multiple clients.

I thought if in each loop of each thread, if it sleeps longer, there
will be
less context switch. While one thread sleeps, it can give other threads
more
time to run, no need swith back to or the first thread. -- this is
totally
wrong idea?

Click to expand...

Yes. It's not clear from your post whether that loop exists in a single
thread that manages multiple connections, or is being executed in multiple
threads, one thread per connection. But either way, the big problem with
the loop is that you are explicitly checking the DataAvailable property,
rather than just calling Read().

If you would just call Read(), then the thread would remain blocked until
there's data to be returned, and would not use any CPU time at all. But
as it is now, you only call Read() when you expect it to return right
away, which means that Windows has to keep scheduling that thread for
execution. Each thread is pretty much guaranteed to have to run 5 times a
second, even for a thread that's dealing with a connection that doesn't
have any i/o happening.

In this scenario, calling Thread.Sleep() is certainly better than not
calling Thread.Sleep(). But you shouldn't be in "this scenario" in the
first place. Polling -- that is, checking the DataAvailable property --
isn't useful, and it causes the performance issues you're seeing.

Personally, I like the async API and that's how I'd do this. But,
assuming that you already have a dedicated thread for each connection, you
may be able to fix things just by taking out the check for DataAvailable,
taking out the calls to Thread.Sleep(), and taking out the queuing of
write operations to the ThreadPool (just do the writes from the same
thread that is handling reads). In other words, most of what's wrong with
your current implementation may well just be that you have too much code.

Pete

Thanks you so much, Pete! Your long response clarify lots of things for me.

I will use async I/O, and do not create any thread myself for clients
(except main thread listen and accept income connect request), and give a
try. I will post back results.

Anything particular I should pay attention about async I/O?

Ryan Liu · Jun 4, 2008

Peter Duniho said:
[I always see weird things, when I reply, it does not add '> ' for each
line
for this particular message. I had to write a code to add it ]

Click to expand...

That's because you're using Outlook Express and I'm using a news reader
that sometimes posts articles using "Quoted-Printable" as the transfer
type. Outlook Express fails to handle that format correctly, and simply
skips quoting text when composing replies to posts using that type.

There's a "quote helper" plug-in you can get for OE...I forgot the name,
but Google should be able to find it for you, especially with the above
information.

[...]
Thanks you so much, Pete!

Click to expand...

You're welcome.

I will use async I/O, and do not create any thread myself for clients
(except main thread listen and accept income connect request), and give a
try. I will post back results.

Click to expand...

For what it's worth, the listening API can be done async as well. It's
not strictly necessary, but if you're going to do async, why not go all
the way?

Anything particulare I should pay attention about async I/O?

Click to expand...

Based one what you've already posted, I believe that you already
understand the basics. You have the essential pattern for use correct,
and you're already doing multi-threaded code so you (hopefully )
already understand what, if any, issues may come up with respect to
coordinating client-specific operations (which can generally all occur in
isolation within a thread) with any application-level operations (where
thread interactions might come into play, such as dealing with a GUI, or
passing computational operations off to some worker thread, etc.). The
async API doesn't introduce any new issues in that respect.

Of course, all of the usual TCP caveats apply, with respect to making sure
you're dealing with i/o as a stream. But if your code was already
basically working, again I would guess you're already familiar with these
issues.

Pete

I rewote code, and tested on client's enviorment.

I wrote 2 version, one version still uses sync I/O, one server thread per
client. I remove all sleeps, and use same thread write back to client. I
also improved I/O part code. It improves performance. The CPU cost is 0
when serving 87 clients. And momory cost is 30M (My code is small, I think
mostly is .NET enviorment).

I improved sync I/O part and remove sleeps for client program as well. It
also runs faster.

But aftere few hours, client CPU cost up to 100%. I notice all files date
has been changed. I think it is because of virus. I am afraid virus is
taking advantage of new code (no sleep and more efficient I/O), so I add
sleep back to client programs. Now CPU cost is 2-3%. Runs OK. This happend
only on one machine. Other clients runs with client program without sleep,
fine.

Few more hours later, server CPU cost up to 98 too. After restart server,
server runs OK again.

I don't know why. Maybe it is the virus in network is also taking advantage
of new code which has no sleep and higher I/O effiency.

I also notice, even when the server CPU cost is 0, and all clients works
fine, sometime server is very slow to response to Service OnCustomCommand()
which runs on server machine itself. I use it to dump server info to a file.
Thread serves network clients can also write log to this file. I use a lock
when write to this file. And ths file's change is alwo watched by another
application using a FileSystemWatcher to read (FileAccess.Read,
FileShare.ReadWrite) whole file cotent to a textbox. Ocz, since it is
another application, read operation it is not locked.

----------
I also tried new server with async I/O. Aftere 80 clients connected, server
crashes immediately. In the system event log, I found:

The PowerCatiAppService service terminated unexpectedly. It has done this 7
time(s).
For more information, see Help and Support Center at
http://go.microsoft.com/fwlink/events.asp.

I aslo see applicatoin error in event log:
EventType clr20r3, P1 powercatiappserver.exe, P2 1.0.3076.27877, P3
4844f3ee, P4 system, P5 2.0.0.0, P6 4333ae87, P7 282e, P8 18, P9
system.objectdisposedexception, P10 NIL.

For more information, see Help and Support Center at
http://go.microsoft.com/fwlink/events.asp.

I think this is for sync I/O version app when it costs 100% CPU.

Thanks!

Ryan Liu · Jun 4, 2008

Peter Duniho said:
I rewote code, and tested on client's enviorment.

I wrote 2 version, one version still uses sync I/O, one server thread per
client. I remove all sleeps, and use same thread write back to client. I
also improved I/O part code. It improves performance. The CPU cost is 0
when serving 87 clients. And momory cost is 30M (My code is small, I
think
mostly is .NET enviorment).

I improved sync I/O part and remove sleeps for client program as well.
It
also runs faster.

Click to expand...

Good. All that sounds just as would be expected.

But aftere few hours, client CPU cost up to 100%. I notice all files date
has been changed. I think it is because of virus. I am afraid virus is
taking advantage of new code (no sleep and more efficient I/O), so I add
sleep back to client programs. Now CPU cost is 2-3%. Runs OK. This
happend
only on one machine. Other clients runs with client program without
sleep,
fine.

Few more hours later, server CPU cost up to 98 too. After restart server,
server runs OK again.

Click to expand...

This suggests, obviously, that adding a sleep doesn't help the high CPU
issue. Which isn't a surprise.

I don't know why. Maybe it is the virus in network is also taking
advantage
of new code which has no sleep and higher I/O effiency.

Click to expand...

Absent confirmed knowledge of a virus, as well as some specific
information as to how it operates, I would not blame this behavior on a
virus. Viruses don't _usually_ "take advantage of" architectural
characteristics of random programs. I can't rule it out 100% -- after
all, virus writers can be crafty bastards -- but it just seems
unlikely.

Instead, I think you might actually be looking for a "packrat" bug. That
is, some data structure that just keeps adding more and more things to
it. Especially if it's a data structure that you regularly scan.
Alternatively, it could be a bug where you have some escalating amount of
object instantiation. That is, as time goes by, some code that creates
multiple objects winds up creating more and more at a time. Even if they
are not "packrat"-ed, the cost of allocating and the disposing them may
consume excessive CPU time.

Without a concise-but-complete code example to look at, it's not possible
to say for sure what this might be. But I'd look for a bug in your own
code.

I also notice, even when the server CPU cost is 0, and all clients works
fine, sometime server is very slow to response to Service
OnCustomCommand()
which runs on server machine itself.

Click to expand...

Again, it's hard to say what might be causing this. Just a random
thought: what are the power management settings on the server? Is the
server slow to respond only when it's been idle for some time? Windows
power management will let a disk spin down, and the time it takes to get
the disk back up to speed again can cause delays in operation. Where
latency is critical, it's important to make sure the power management
settings don't idle the disks.

If that's not the issue, then again without a concise-but-complete code
sample, diagnosing it is practically impossible. It could somehow be
related to the file locking, but it also might not be.

[...]
I also tried new server with async I/O. Aftere 80 clients connected,
server
crashes immediately. In the system event log, I found:

The PowerCatiAppService service terminated unexpectedly. It has done
this 7
time(s).
For more information, see Help and Support Center at
http://go.microsoft.com/fwlink/events.asp.

I aslo see applicatoin error in event log:
EventType clr20r3, P1 powercatiappserver.exe, P2 1.0.3076.27877, P3
4844f3ee, P4 system, P5 2.0.0.0, P6 4333ae87, P7 282e, P8 18, P9
system.objectdisposedexception, P10 NIL.

Click to expand...

Well, that sure looks like you're trying to access an object that's
already been disposed. That would certainly be wrong.

Again, without a concise-but-complete code sample, impossible to say why
your code is doing that.

Note that I keep mentioning a "concise-but-complete code sample". Of
course, it's highly unlikely anyone reading this newsgroup would take the
time to navigate your entire server or client code. However, if you can
write a very small test application that implements just the basic
networking pieces you're trying to get working, that would be useful for
consideration.

Note also that it may be useful to _you_ to write such an application even
if you never post it. It's a lot easier to learn the specific techniques
and to find and fix design and coding errors when you're dealing with a
small test application that has no extraneous functionality to distract
you.

In other words, you might consider doing your learning on such a small
test application, rather than trying to implement the whole thing in-place
on production code.

Pete

Thanks Pete, I will ask the customer keep an eye on power management issue.
I have written a small client/server test code and tested OK. I don't have
the customer's enviorment with hundreds of computers. I will think how to
simulate it. If I have any found, I will post back here.

Thanks for pointing out directions I can looking for problems.
Ryan

context switch

Ryan Liu

Nicholas Paldino [.NET/C# MVP]

Ryan Liu

Nicholas Paldino [.NET/C# MVP]

Ryan Liu

Nicholas Paldino [.NET/C# MVP]

Ryan Liu

Rene

Ryan Liu

Ryan Liu

Jon Skeet [C# MVP]

Jon Skeet [C# MVP]

Ryan

Ryan Liu

Ryan Liu

Ryan Liu

Ryan Liu

Ryan Liu