Asynchronous Programming

W

William Stacey [MVP]

I gotta tell ya Chad, you made me see a new light on this. I went and put
together a virtual socket class that "creates" packets for reading and
created two tests. First test is creating X number of threads and one
socket per thread that reads N number of packets. Second test was creating
same number of sockets in a object that manages the sockets in a collection
and posts an event that a worker thread(s) waits on. Results vary based on
syncing threads with a shared object (i.e. a fake write queue for example)
that you can set to sleep and/or spinWait. I could get ~20,000 threads
running pretty well. Trying to start 30,000 locked up my app each time, but
could just stop it in TaskManager without a crash. I never saw TaskManager
Threads go over ~571, so not sure if that is just showing running threads or
if CLR only releases so many OS threads and does some other management in
the background. Very suprising results. It's not a clear winner in all
cases and not sure of effects of running 1000+ threads for long time in a
server app, but these results are interesting. If anyone wants to play with
this test harness, I can post it to web. It fun to play with different
options to see time effects of injecting sleeps or waits, etc. Cheers!
 
W

Willy Denoyette [MVP]

Hmm, I would like to see some code.
With a default 1MB stack space reserved per thread and a 2GB user address
space, the max number of threads is less than 2000, you can't have 20000
threads created.
You said:
<I could get ~20,000 threads running pretty well. >
But taskmanager only showed ~571, that means currently only that number of
OS threads where created. And from here the system started trashing, right.
Creating additional threads takes so much time that you won't get any
additional threads from the OS.

Willy.

William Stacey said:
I gotta tell ya Chad, you made me see a new light on this. I went and put
together a virtual socket class that "creates" packets for reading and
created two tests. First test is creating X number of threads and one
socket per thread that reads N number of packets. Second test was
creating
same number of sockets in a object that manages the sockets in a
collection
and posts an event that a worker thread(s) waits on. Results vary based
on
syncing threads with a shared object (i.e. a fake write queue for example)
that you can set to sleep and/or spinWait. I could get ~20,000 threads
running pretty well. Trying to start 30,000 locked up my app each time,
but
could just stop it in TaskManager without a crash. I never saw
TaskManager
Threads go over ~571, so not sure if that is just showing running threads
or
if CLR only releases so many OS threads and does some other management in
the background. Very suprising results. It's not a clear winner in all
cases and not sure of effects of running 1000+ threads for long time in a
server app, but these results are interesting. If anyone wants to play
with
this test harness, I can post it to web. It fun to play with different
options to see time effects of injecting sleeps or waits, etc. Cheers!
 
W

William Stacey [MVP]

Sounds strange, but it worked. .Net must be using fibers or something.
I'll post the code.

--
William Stacey, MVP

Willy Denoyette said:
Hmm, I would like to see some code.
With a default 1MB stack space reserved per thread and a 2GB user address
space, the max number of threads is less than 2000, you can't have 20000
threads created.
You said:
<I could get ~20,000 threads running pretty well. >
But taskmanager only showed ~571, that means currently only that number of
OS threads where created. And from here the system started trashing, right.
Creating additional threads takes so much time that you won't get any
additional threads from the OS.

Willy.
 
W

Willy Denoyette [MVP]

No it doesn't use fibers. The problem is that the number of threads
currently created (what you see in perfmon), set's such a load on the system
(CPU and Page File IO) that new threads aren't getting created.
Please watch some perf counters like Process Threads ,Process Virtual Memory
size and Page File size and you will understand why.

Willy.
 
W

William Stacey [MVP]

Your right Willy. I botched it after looking harder. I created the 20,000,
then started them one after another in a loop. The first ones started and
finished before the next couple couple started and chased each other like
that until loop done - so probably no more then 10-30 actually ran at same
time. I fixed that and could get about 1700 but never 1800 started at same
time. I will fix some more to get some better results. Actually, I am glad
of this because this goes more inline with what I though would happen before
I started the test and what I have read in many good books on the subject
(so I don't have to burn them now.) Will post the code anyway when I
tighten a few things up and post some numbers. As ~1700 seems to be max
(without hardly anything else going on), I would venture a guess that
100-500 may be ~workable in a running server for a thread per client
approach (not sure as this point). That means (I think) a socket collection
with an Event at the head of a loop is the ~only way to do high connection
(500+) count TCP servers. BTW - How does IIS work it?

Thanks for post. Cheers!
 
C

Chad Z. Hower aka Kudzu

William Stacey said:
up and post some numbers. As ~1700 seems to be max (without hardly

Yes it is becuase of process limits. Each thread is allocated 2M of process
space, and in a 32 bit space 4G is the limit. So thats 2000, but a process
already has allocated space... so 1700 is about right.

In XP you can create threads with smaller process spaces to bump it higher.
Im not sure if .net threads have this option.
anything else going on), I would venture a guess that 100-500 may be
~workable in a running server for a thread per client approach (not sure

100-500 is very workable. I have servers running 24/7/365 with that many
threads in many many installations wtih no issues and no major CPU drain.

Even 1,000 is feasible if you dont build in your own bottlenecks. In most
cases interthread communication or contention in user code is the
bottleneck long before the threads are.
as this point). That means (I think) a socket collection with an Event
at the head of a loop is the ~only way to do high connection (500+)

Nope.... As I said even 1000 is ok. And I've gone as high as 1500
successfully.
count TCP servers. BTW - How does IIS work it?

IIS uses IOCP and thread pools. Remember IIS is HTTP and HTTP is "hit and
go" so it doesnt need to keep thread around for "spurious" connections.

Indy 10 has an option to use fibers and IOCP. The cool thing is its all
hidden from the user and they can switch from threads + winsock to fibers +
IOCP without changing a single line of their code.

And since Indy 10 has its own scheduler built in - its VERY efficient. Its
not been optimized yet but its been through some serious testing and will
push to 10,000+ and likely when done push to 40,000 or more (the socket
limit of Windows) on a machine with enough RAM to allocate sockets. So in
Indy 10 - the limit now is Windows and how many sockets it can allocate,
not the threads.

The problem at 40,000 again becomes one of just sheer memory though... 64
bit will help here.
 
W

William Stacey [MVP]

Nope.... As I said even 1000 is ok. And I've gone as high as 1500
successfully.

Well I was thinking more in terms of being safe and not pushing things util
it crashes as we may not have perfect knowledge upfront of all the mem
issues and async delegates and other IO (disk, etc). If 1700 is around max,
then I would not feel real sure about pushing 1000 as that gets close, and
you may have other async stuff going on and the rest of mem and allocations
your program is doing. Not to mention other network programs the server
could be running - assuming we don't have 100% rule over the box for one
program. If you had to pick some number, it would probably be less then
1000 with some rounded down fudge factor for safety. That said, you end up
having to pick a max number and error on the safe side. So if you settle on
700, but you need 1000-10,0000+ max connections, the socket collection with
event is the only .net/win32 way I see at this point without going to IOCPs.
(not assuming Indy for sake of discussion.)
And since Indy 10 has its own scheduler built in - its VERY efficient. Its
not been optimized yet but its been through some serious testing and will
push to 10,000+ and likely when done push to 40,000 or more (the socket
limit of Windows) on a machine with enough RAM to allocate sockets. So in
Indy 10 - the limit now is Windows and how many sockets it can allocate,
not the threads.

Cool. Thanks Chad.

--wjs
 
W

Willy Denoyette [MVP]

Chad Z. Hower aka Kudzu said:
Yes it is becuase of process limits. Each thread is allocated 2M of
process
space, and in a 32 bit space 4G is the limit. So thats 2000, but a process
already has allocated space... so 1700 is about right.

User space is 2GB (half of the 4GB process space) or 3GB on
"LargeAddressAware" enabled systems (3GB switch), default stack space is 1MB
per OS thread, .NET doesn't expose a managed way to create threads with less
stack space, running XP and higher, one can allways call CreateThread using
PInvoke.
The default Stackspace can be changed using EDITBIN.EXE.

Willy.
 
C

Chad Z. Hower aka Kudzu

Willy Denoyette said:
User space is 2GB (half of the 4GB process space) or 3GB on
"LargeAddressAware" enabled systems (3GB switch), default stack space is

Aah yes. Sorry you are right on the 1M and 2G. Was late last night. ;)
1MB per OS thread, .NET doesn't expose a managed way to create threads

This is a shame. :(
The default Stackspace can be changed using EDITBIN.EXE.

EditBin? You mean as in editing the .net framework?
 
J

Jon Skeet [C# MVP]

Chad Z. Hower aka Kudzu said:
EditBin? You mean as in editing the .net framework?

No - editing the assembly, which is still just a PE file after all. I
believe it can be done, but it's not recommended :)
 
W

Willy Denoyette [MVP]

Inline ***

Willy.

Chad Z. Hower aka Kudzu said:
Aah yes. Sorry you are right on the 1M and 2G. Was late last night. ;)


This is a shame. :(
*** Not realy, except for special cases where you know exactly how much
stack space will be required, but here I assume you will write your own CLR
host, just like asp.net does.
EditBin? You mean as in editing the .net framework?

*** No, the executable assembly, which is just a normal PE file.
So, editbin /stack:256000 YourAssembly.exe will set the default reserved
stack size to 256Kb.
Now the problem here is that all threads will take the same value, a low
value may lead to stackoverflow exceptions being thrown, when set too high
its just a waste of virtual memory. So IMO it's better to leave the value at
1MB default.
 
C

Chad Z. Hower aka Kudzu

Willy Denoyette said:
*** No, the executable assembly, which is just a normal PE file.
So, editbin /stack:256000 YourAssembly.exe will set the default
reserved stack size to 256Kb.

Aah by setting the process one. Thats how its done pre XP too.
Now the problem here is that all threads will take the same value, a low
value may lead to stackoverflow exceptions being thrown, when set too
high its just a waste of virtual memory. So IMO it's better to leave the
value at 1MB default.

In XP you can set it per thread - and it works very nicely.
 
C

Chad Z. Hower aka Kudzu

William Stacey said:
Well I was thinking more in terms of being safe and not pushing things
util it crashes as we may not have perfect knowledge upfront of all the
mem issues and async delegates and other IO (disk, etc). If 1700 is
around max, then I would not feel real sure about pushing 1000 as that
gets close, and you may have other async stuff going on and the rest of

1000 is not close to 1700. :)

When my son is 10 - I wont be worried about him "almost" being 17. :)
 
W

Willy Denoyette [MVP]

inline ***
Willy.

Chad Z. Hower aka Kudzu said:
Aah by setting the process one. Thats how its done pre XP too.
*** Yep.
In XP you can set it per thread - and it works very nicely.
*** Sure, but not from within .NET, unless you PInvoke or use MC++.
 
W

William Stacey [MVP]

Ok wise guy. Then lets play "pick a number" then. What's your number?
Factor in some working space for your actual program data, etc. And factor
in some fudge factor so as not to always be bumping max. Seems when you
push the memory line, strange things seem to happen. So its not 1700 or
1699, or even 1698. However it is probably > 700 and less then 1700.
 
D

D. Yates

Chad,
Indy 10 has an option to use fibers and IOCP. The cool thing is its all
hidden from the user and they can switch from threads + winsock to fibers +
IOCP without changing a single line of their code.


Are you still writing Delphi code or have you switched to C#?

Is Indy 10 written in C# or still in Delphi?

Dave
 
C

Chad Z. Hower aka Kudzu

William Stacey said:
Ok wise guy. Then lets play "pick a number" then. What's your number?
Factor in some working space for your actual program data, etc. And factor
in some fudge factor so as not to always be bumping max. Seems when you
push the memory line, strange things seem to happen. So its not 1700 or
1699, or even 1698. However it is probably > 700 and less then 1700.

1699 and so on yes - but you stated 1000 was almost 1700. Thats what I was
replying to.
 
C

Chad Z. Hower aka Kudzu

D. Yates said:
Are you still writing Delphi code or have you switched to C#?

Still Delphi. We use C# in our company for demos and some VS specific code,
otherwise still all Delphi. Only Delphi allows us with one source code base
to support Win32, .Net and Linux.
Is Indy 10 written in C# or still in Delphi?

Delphi. I think you underestimate the sheer size and volume of code in Indy
if you think it could be ported to C# so easily. :)
 
C

Chad Z. Hower aka Kudzu

William Stacey said:
Ok wise guy. Then lets play "pick a number" then. What's your number?
Factor in some working space for your actual program data, etc. And factor
in some fudge factor so as not to always be bumping max. Seems when you

Here is the quote BTW:
"If 1700 is around max, then I would not feel real sure about pushing 1000 as
that gets close,"

I dont dispute that 1000 may be too much in most situations - I simply
disputed this statement that 1000 is close to 17000.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top