Custom Thread Pool (by Mr. Jon Skeet) Enhancement

K

Kieran Benton

Hello,
I'm currently in the tail end process of developing a high scalability
server for my employer. Essentially it receives short socket based
connections with an ASCII message, parses that message, does some
processing and then sends out a string reply on the same connection.

I'm using the asynchrounous IO completion port based socket methods in
..NET 2.0 to handle the comms side of things (giving extremely good
performance) and the venerable Mr. Skeet's custom threadpool to execute
the processing portion of the system. My problem is that not all of the
messages are created equal. Some simply do a very quick query from a DB
(or even a cache), whereas others do a credit card authorisation or
some longer SQL work.

Because of this I've tried to seperate my messages into two classes,
short running and long running, running them on two seperate
threadpools. Its become very hard to manage how many threads I allocate
to each - as all messages involve some I/O blocking I want to avoid
starving the pools as much as possible to keep latency low.

Based on this article:
http://blogs.msdn.com/cbrumme/archive/2004/02/21/77595.aspx (Threading
and Synchronization, near the top - he explains the concept much better
than I ever could), I've been thinking about expanding Jon's ThreadPool
to automatically determine its own maximum number of threads to keep
CPU usage high (by increasing the number of threads) whilst not choking
the machine to death with too many context switches. This is apparently
a big simplification of what MS SQL Server does internally, and I'd
like to replace my two pools with jsut one of this nature.

Essentially I'd just like anyone's comments (especially yours Jon!) as
to whether this is worthwhile, or how to go about it - I've got some
ideas of my own:
1. Check on completion of a work item the CPU usage and decide whether
to create a new thread, keep the thread count the same, suspend the
thread?
2. Add a timer (100ms?) that checks the CPU periodically and sets the
max and min thread values? How does CPU usage work with multiple CPUs?

Its getting to be a bit of a pain as I have to tune the system as it is
to the machine it runs on and the type of load it encounters!

Thanks everyone!
 
J

Jon Skeet [C# MVP]

Kieran Benton said:
I'm currently in the tail end process of developing a high scalability
server for my employer. Essentially it receives short socket based
connections with an ASCII message, parses that message, does some
processing and then sends out a string reply on the same connection.

I'm using the asynchrounous IO completion port based socket methods in
.NET 2.0 to handle the comms side of things (giving extremely good
performance) and the venerable Mr. Skeet's custom threadpool to execute
the processing portion of the system. My problem is that not all of the
messages are created equal. Some simply do a very quick query from a DB
(or even a cache), whereas others do a credit card authorisation or
some longer SQL work.

Because of this I've tried to seperate my messages into two classes,
short running and long running, running them on two seperate
threadpools. Its become very hard to manage how many threads I allocate
to each - as all messages involve some I/O blocking I want to avoid
starving the pools as much as possible to keep latency low.

Right. Could you not use asychronous IO again, and keep the thread pool
for actual processing, so that any thread which is in the pool is
either running or available, not blocking in a useless way? It's no
doubt relatively tricky to write the code that way, but it sounds like
the way to get the maximum performance out.
Based on this article:
http://blogs.msdn.com/cbrumme/archive/2004/02/21/77595.aspx (Threading
and Synchronization, near the top - he explains the concept much better
than I ever could), I've been thinking about expanding Jon's ThreadPool
to automatically determine its own maximum number of threads to keep
CPU usage high (by increasing the number of threads) whilst not choking
the machine to death with too many context switches. This is apparently
a big simplification of what MS SQL Server does internally, and I'd
like to replace my two pools with jsut one of this nature.

Have you tried just using a relatively large pool, and measuring how
much time is actually spent context switching? Do you definitely have a
problem?
Essentially I'd just like anyone's comments (especially yours Jon!) as
to whether this is worthwhile, or how to go about it - I've got some
ideas of my own:
1. Check on completion of a work item the CPU usage and decide whether
to create a new thread, keep the thread count the same, suspend the
thread?
2. Add a timer (100ms?) that checks the CPU periodically and sets the
max and min thread values? How does CPU usage work with multiple CPUs?

Its getting to be a bit of a pain as I have to tune the system as it is
to the machine it runs on and the type of load it encounters!

You certainly could take either of those approaches, but I don't know
whether they'd really do what you want them to. I'd definitely try
profiling with a few different sizes of threadpool first.
 
K

Kieran Benton

Hi Jon,
I'm certain we do have a problem, sometimes we are processing
exclusively short lived, high CPU/low blocking messages - with which a
large threadpool is less efficient, at other times low CPU/long
blocking messages where a larger threadpool is more performant. Or a
mixture of the two. We've done plenty of tests on production servers
with built in performance metrics and performance counters collecting
our data.

As to using async IO, I assume you mean the new stuff for DB access? We
are using this where possible, althoguh as you say it is a royal pain
in the arse for some situations! Unfortunately one of our longest
running (and most common) messages involves using a COM object I have
no control internally over - so I'm kind of SOA with that. I agree
though that async DB access is the way to go wherever it is humanely
possible.

I'm interested on your thoughts on when in your TP to make a decision
over whether to create a new thread and whether to destroy (or maybe
even just suspend?) a thread. Another niggle in my mind is whether to
work out some kind of metric in terms of remaining memory and CPU, or
whether to just monitor the CPU. Hopefully I'll get started on this
soon and be able to post some benchmark results.
 
J

Jon Skeet [C# MVP]

Kieran Benton said:
I'm certain we do have a problem, sometimes we are processing
exclusively short lived, high CPU/low blocking messages - with which a
large threadpool is less efficient, at other times low CPU/long
blocking messages where a larger threadpool is more performant. Or a
mixture of the two. We've done plenty of tests on production servers
with built in performance metrics and performance counters collecting
our data.

Right. Oh dear :(
As to using async IO, I assume you mean the new stuff for DB access? We
are using this where possible, althoguh as you say it is a royal pain
in the arse for some situations! Unfortunately one of our longest
running (and most common) messages involves using a COM object I have
no control internally over - so I'm kind of SOA with that. I agree
though that async DB access is the way to go wherever it is humanely
possible.

Yes - although it does tend to be a pig to code.
I'm interested on your thoughts on when in your TP to make a decision
over whether to create a new thread and whether to destroy (or maybe
even just suspend?) a thread. Another niggle in my mind is whether to
work out some kind of metric in terms of remaining memory and CPU, or
whether to just monitor the CPU. Hopefully I'll get started on this
soon and be able to post some benchmark results.

Hmm. Yes, it sounds like a CPU monitor *might* work. If I were you, I'd
try adapting my threadpool to have use kind of interface which it asks
about whether or not to kill a thread and whether or not to create one
- then one could have different policies for different situations.
Easier said than done, of course...
 
K

Kieran Benton

I'm certain we do have a problem, sometimes we are processing
Right. Oh dear :(

Yup, its a tricky one isnt it? :) Its not actually too bad balancing it
by hand - I would just like a more flexible solution that responds to
the type of load its under.
Yes - although it does tend to be a pig to code.

Absolutely - the semantics of having to break quite a simple processing
block up into async methods... Well - yuck! It would be really quite
nice if you could have an async method run inplace, yield the thread to
the pool for some other processing and then ask for one back again when
it needs it. Not going to happen of course - mainly as that is a gross
simplification! :)
Hmm. Yes, it sounds like a CPU monitor *might* work. If I were you, I'd
try adapting my threadpool to have use kind of interface which it asks
about whether or not to kill a thread and whether or not to create one
- then one could have different policies for different situations.
Easier said than done, of course...

Cheers for your advice Jon, your input is much appreciated. I think I
will go down that route!
 
J

Jon Skeet [C# MVP]

Kieran Benton said:
Yup, its a tricky one isnt it? :) Its not actually too bad balancing it
by hand - I would just like a more flexible solution that responds to
the type of load its under.

Yes. That sort of heuristic approach is usually a pain to implement, of
course.
Absolutely - the semantics of having to break quite a simple processing
block up into async methods... Well - yuck! It would be really quite
nice if you could have an async method run inplace, yield the thread to
the pool for some other processing and then ask for one back again when
it needs it. Not going to happen of course - mainly as that is a gross
simplification! :)

It would be interesting to try to develop a language extension which
allowed that. In some ways it would be akin to the "yield" in C# 2.0
for implementing iterators...
Cheers for your advice Jon, your input is much appreciated. I think I
will go down that route!

Best of luck - and if the IP doesn't prevent you from showing the code
afterwards, I'd be interested in bringing this flexibility into the
custom threadpool code.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top