Async Socket IO Question

E

EmeraldShield

Hi all. I have been digging around trying to find an answer to a few
questions that are bugging me. I am hoping someone here can help.

1 - If you start an Async IO (BeginAccept) and the client hangs up what
happens? The docs really don't tell you one way or the other. In my local
testing many of the sockets are never getting released for some reason. I
am not seeing my delegate called. I used an Interlock increment on each
begin and a decrement on each delegate to track it and sometimes they come
through, and sometimes they don't. Very odd. What is the correct behavior?

2 - Why does the .Connected property not update? I have read the docs and
tried lots of tests. It still does not appear to work.

try
{
// Attempt to ensure we are still connected...
byte[] temp = { 0x00 };
// I tried blocking and non-blocking didn't make a difference
sp.Client.Blocking = false;
// This is the MS recommended way. It ALWAYS returns 0 for me and
connected is not updated.
int res = sp.Client.Send(temp, 0, 0);
// I added this as an additional test and the same thing happens.
Always get back 0.
res = sp.Client.Receive(temp, 0, SocketFlags.None);
// If we get here we are still connected and alive...
}
catch( SocketException e )
{
if( e.NativeErrorCode.Equals(10035) )
{
// Still connected - the call would black
}
else
{
// We are disconnected
TimeoutOccured(ref sp);
return (false);
}
}

This app is a socket server that has remote clients connect. I can manually
telnet to the app, watch it start a read, and then kill the telnet app. I
know the socket is gone. I look in the process list and telnet is gone.
But the reads and sends still report it is valid, and connected still
reports true.
I do this test above in my routine prior to calling the beginreceive (I
figure there is no use beginning a receive if the client hung up), and
before sending data. Doesn't seem to make a difference.

What am I doing wrong?
 
C

Chris Mullins

I think you're mostly bumping into Socket Timeout issues.

If you kill the process of your client app, the TCP session isn't cleanly
shut down. Your server still thinks the connection exists. You can verify
this using "NetStat -a". You'll see your connection still in there.

When your server sends to the client app, that send happens just fine (the
TCP Session still there). After a few moments, your TCP send will timeout,
and you'll get an error. At this point the TCP Session is torn down.

Just because it's amusing, I have had more bugs in socket shutdown code than
all the other areas of networking put together. There are so many ways, and
so many conditions, that can cause a TCP session to be torn down that it's
just depressing.

Here's an MS KB article that goes into how to adjust your timeouts:
http://support.microsoft.com/?kbid=170359

Note: I don't recommend adjusting your timeouts - but you do need to have a
solid understanding of what's going on.

--
Chris Mullins, MCSD.NET, MCPD:Enterprise
http://www.coversant.net/blogs/cmullins


EmeraldShield said:
Hi all. I have been digging around trying to find an answer to a few
questions that are bugging me. I am hoping someone here can help.

1 - If you start an Async IO (BeginAccept) and the client hangs up what
happens? The docs really don't tell you one way or the other. In my
local testing many of the sockets are never getting released for some
reason. I am not seeing my delegate called. I used an Interlock
increment on each begin and a decrement on each delegate to track it and
sometimes they come through, and sometimes they don't. Very odd. What is
the correct behavior?

2 - Why does the .Connected property not update? I have read the docs and
tried lots of tests. It still does not appear to work.

try
{
// Attempt to ensure we are still connected...
byte[] temp = { 0x00 };
// I tried blocking and non-blocking didn't make a difference
sp.Client.Blocking = false;
// This is the MS recommended way. It ALWAYS returns 0 for me and
connected is not updated.
int res = sp.Client.Send(temp, 0, 0);
// I added this as an additional test and the same thing happens.
Always get back 0.
res = sp.Client.Receive(temp, 0, SocketFlags.None);
// If we get here we are still connected and alive...
}
catch( SocketException e )
{
if( e.NativeErrorCode.Equals(10035) )
{
// Still connected - the call would black
}
else
{
// We are disconnected
TimeoutOccured(ref sp);
return (false);
}
}

This app is a socket server that has remote clients connect. I can
manually telnet to the app, watch it start a read, and then kill the
telnet app. I know the socket is gone. I look in the process list and
telnet is gone. But the reads and sends still report it is valid, and
connected still reports true.
I do this test above in my routine prior to calling the beginreceive (I
figure there is no use beginning a receive if the client hung up), and
before sending data. Doesn't seem to make a difference.

What am I doing wrong?
 
E

EmeraldShield

Logically I knew that... :) yes, the connection has not been closed and is
still pending timeout.

Thanks for the link.

But what happens if 500 people hit your socket and then all hangup? You
will get all the accepts and start receives on them. But then what happens?
Did that just kill your IOCP pool of threads?

Thanks.


Chris Mullins said:
I think you're mostly bumping into Socket Timeout issues.

If you kill the process of your client app, the TCP session isn't cleanly
shut down. Your server still thinks the connection exists. You can verify
this using "NetStat -a". You'll see your connection still in there.

When your server sends to the client app, that send happens just fine (the
TCP Session still there). After a few moments, your TCP send will timeout,
and you'll get an error. At this point the TCP Session is torn down.

Just because it's amusing, I have had more bugs in socket shutdown code
than all the other areas of networking put together. There are so many
ways, and so many conditions, that can cause a TCP session to be torn down
that it's just depressing.

Here's an MS KB article that goes into how to adjust your timeouts:
http://support.microsoft.com/?kbid=170359

Note: I don't recommend adjusting your timeouts - but you do need to have
a solid understanding of what's going on.

--
Chris Mullins, MCSD.NET, MCPD:Enterprise
http://www.coversant.net/blogs/cmullins


EmeraldShield said:
Hi all. I have been digging around trying to find an answer to a few
questions that are bugging me. I am hoping someone here can help.

1 - If you start an Async IO (BeginAccept) and the client hangs up what
happens? The docs really don't tell you one way or the other. In my
local testing many of the sockets are never getting released for some
reason. I am not seeing my delegate called. I used an Interlock
increment on each begin and a decrement on each delegate to track it and
sometimes they come through, and sometimes they don't. Very odd. What
is the correct behavior?

2 - Why does the .Connected property not update? I have read the docs
and tried lots of tests. It still does not appear to work.

try
{
// Attempt to ensure we are still connected...
byte[] temp = { 0x00 };
// I tried blocking and non-blocking didn't make a difference
sp.Client.Blocking = false;
// This is the MS recommended way. It ALWAYS returns 0 for me and
connected is not updated.
int res = sp.Client.Send(temp, 0, 0);
// I added this as an additional test and the same thing happens.
Always get back 0.
res = sp.Client.Receive(temp, 0, SocketFlags.None);
// If we get here we are still connected and alive...
}
catch( SocketException e )
{
if( e.NativeErrorCode.Equals(10035) )
{
// Still connected - the call would black
}
else
{
// We are disconnected
TimeoutOccured(ref sp);
return (false);
}
}

This app is a socket server that has remote clients connect. I can
manually telnet to the app, watch it start a read, and then kill the
telnet app. I know the socket is gone. I look in the process list and
telnet is gone. But the reads and sends still report it is valid, and
connected still reports true.
I do this test above in my routine prior to calling the beginreceive (I
figure there is no use beginning a receive if the client hung up), and
before sending data. Doesn't seem to make a difference.

What am I doing wrong?
 
C

Chris Mullins

[TCP Timeouts]
Logically I knew that... :) yes, the connection has not been closed and
is still pending timeout.

Glad that was easy to track down. Netstat can be your friend.
But what happens if 500 people hit your socket and then all hangup? You
will get all the accepts and start receives on them. But then what
happens? Did that just kill your IOCP pool of threads?

The Async Socket Mechanism will keep chugging along just fine. Just because
you have 500 sockets in "BeginRead" does NOT mean you have 500 sockets
blocked inside a thread somewhere. Quite the opposite in fact.

I wrote up a fair bit of architecture around what we found to be the best
way to do this a while back:
http://www.coversant.net/dotnetnuke/Default.aspx?tabid=88&EntryID=10

Note: the architecture described there differs from what many people
recommend. Especially some very well known, and well respected authors on
the subjects of scalability. I've personally talked with many of those
authors regarding this blog post, and had some very interesting discussions.
At the end of the day though, the architecture described here does really
works and scale that high. The archtiectures many others recommend (which is
where we started) completely failed us in real-world usage.

We have scaled this up beyond belief. The Async stuff works very, very well.
 
E

EmeraldShield

Thanks Chris,
The Async Socket Mechanism will keep chugging along just fine. Just
because you have 500 sockets in "BeginRead" does NOT mean you have 500
sockets blocked inside a thread somewhere. Quite the opposite in fact.

I have written quite a bit of IOCP code in C++ over the years, but not much
in C# yet. I guess I am going to have to really build some test cases and
beat on it like I did while learning the IOCP / C++ code. In fact many of
those old test apps I wrote may still be applicable... Didn't think about
that.

I wrote up a fair bit of architecture around what we found to be the best
way to do this a while back:
http://www.coversant.net/dotnetnuke/Default.aspx?tabid=88&EntryID=10

Note: the architecture described there differs from what many people
recommend. Especially some very well known, and well respected authors on
the subjects of scalability. I've personally talked with many of those
authors regarding this blog post, and had some very interesting
discussions. At the end of the day though, the architecture described here
does really works and scale that high. The archtiectures many others
recommend (which is where we started) completely failed us in real-world
usage.

We have scaled this up beyond belief. The Async stuff works very, very
well.

Very interesting read. Non-inuitive. But you can't argue with the results.

I am puzzled as to how you were able to support 100,000 TCP connetions under
NT though. AFIAK the system only allows 10,000 by default without registry
tweaks. And then beyond around 75,000 you start to run into paged pool
limits and vtables locked in RAM by the OS. Unless dot net does something
different under the hood from how we used to do it in C++.

Very encouraging results though. I am porting a smaller app of ours to C#
mostly for maintenance reasons. It is an older C++ app that needed to be
updated for new things anyway, so I decided to rearchitect it and move up to
C# at the same time.

We support 100,000+ connections on the C++ app. Will be interesting to see
what the C# app can do. If this works well we have a much larger app that I
will make the move with as well. The entire managed architecture is very,
very appealing to me after years of off by one crashes. :)

Thanks.
 
C

Chris Mullins

EmeraldShield said:
[Scaling Socket Apps]
We have scaled this up beyond belief. The Async stuff works very, very
well.

Very interesting read. Non-inuitive. But you can't argue with the
results.

Yea, there is that.
I am puzzled as to how you were able to support 100,000 TCP connetions
under NT though.

Well, we've never tried under Windows NT. The O/S we do our heavy lifting on
is Windows 2003 Server.
AFIAK the system only allows 10,000 by default without registry tweaks.

As a socket client application, there are default limits in the system. As a
socket server application, these limits don't really exist. Perhps on NT,
but not (that I remember) on Win2K, and certainly not on Win2k3.
And then beyond around 75,000 you start to run into paged pool limits and
vtables locked in RAM by the OS.

You sure do. Each Async socket takes a certain amount of non-paged memory.
On a "big" 32 bit machine (4 gb of memory) we can do about 50k simultanious
sockets. Things don't really scale that well there.

The key phrase there is "32 bit machine". On 64-bit hardware the limits are
much, much higher.
Unless dot net does something different under the hood from how we used to
do it in C++.

Nope. Same thing. 64-bit hardware is just bigger and badder:
http://www.coversant.net/bigiron.PNG

Low-end 64-bit hardware is practical today. A dual-core AMD machine with 4GB
of memory and Windows Server 2003 x64 installed will do just fine for big
socket apps.
We support 100,000+ connections on the C++ app. Will be interesting to
see what the C# app can do. If this works well we have a much larger app
that I will make the move with as well. The entire managed architecture
is very, very appealing to me after years of off by one crashes. :)

If it's a simple socket server, you should be fine. When I was playing with
scalability limits (by writing nothing more than an Async Echo Server), the
limits were very high. I didn't document it anywhere, and don't really
remember what "very high" is though. :(

I do remember "very high" was such that finding enough client firepower to
max it out was difficult.
 
C

Chris Mullins

Thanks for writing that up Chris. The pattern is a good one.
I have seen it called Half-Sync/Half-Async before:
http://www.cs.wustl.edu/~schmidt/PDF/PLoP-95.pdf

That's interesting. I haven't seen that paper before, or heard the term, yet
it's very close to what we do.

As it sits now, it's fairly easy for developers to add new features into our
server code base - the majority of the code they have to write is
synchronous. It's not until things go Async that developers tend to get
quickly confused.

We've been looking to go more and more async, but given the synchronous
nature of database access (and the lack of support for an async model),
we're stuck in sync land for much of our processing.
 
W

William Stacey [C# MVP]

Another related idea would be to use the CCR library (still in beta and
currently distributed with the robotics library, but very good). It is
Port/Message based with it own thread pool(s) (can also use the native .Net
TP) and allows all kinds of interesting stuff like scatter/gather, port
joins, etc. So for example, you could async wait on a socket message which
gets posted to a Port after full message arrives. Your port arbiter (i.e.
delegate) now does work with the message. This work could be kicking off 4
additional requests to different databases (or files, etc) and then start an
async wait on a Result Port for all operations to return or error. Finally,
return result to original requester. All this can be done without blocking
any threads on IO. Effectively, it allows the same pattern with port
queues and thread pools, but has it all built in to the Port abstractions -
very powerful.

--
William Stacey [C# MVP]

| "William Stacey [C# MVP]" wrote in message:
| > Thanks for writing that up Chris. The pattern is a good one.
| > I have seen it called Half-Sync/Half-Async before:
| > http://www.cs.wustl.edu/~schmidt/PDF/PLoP-95.pdf
| >
|
| That's interesting. I haven't seen that paper before, or heard the term,
yet
| it's very close to what we do.
|
| As it sits now, it's fairly easy for developers to add new features into
our
| server code base - the majority of the code they have to write is
| synchronous. It's not until things go Async that developers tend to get
| quickly confused.
|
| We've been looking to go more and more async, but given the synchronous
| nature of database access (and the lack of support for an async model),
| we're stuck in sync land for much of our processing.
|
| --
| Chris Mullins, MCSD.NET, MCPD:Enterprise
| http://www.coversant.net/blogs/cmullins
|
|
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top