HttpWebRequest and Multi Threaded Apps

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

Hi,

Im in the process of writing a program that crawls a website. Im using the
HttpWebRequest and HttpWebResponse classed to get content. To make my
application more scalable, my application is multithreaded, with each thread
making a different request.

One problem Ive ran into is that when writing the response stream to the
filesystem, from the HttpWebResponse.GetResponseStream() method, if the
response is of a sufficient size, it blocks all other other executing
threads. If this exceeds the timeout, then all the other threads that are in
the process of making a Web Request time out.

Ive taken every effort to ensure that my application is not locking, I was
wondering if given the scenario that Ive listed, would using Asynchronous Web
Requests be more efficient? Ive not worked with them before so Id be grateful
of your opinions.

Cheers,

Mark
 
You have to be careful when using the HttpWebRequest and HttpWebResponse
classes in a mutlithreaded enviornment. They use the .NET ThreadPool to do
their work internally (internally they are asychronous and use internal waits
to wait for a response). The .NET ThreadPool only have 25 threads so if you
are spawning off lots of threads and lots of request you'll eventually end up
getting InvalidOperationExceptions and so on.


If you think that a particular web request is taking too long I suggest
setting a lower Timeout on the HttpWebRequest object that is making it...
that way the thread will not wait as long.

By the way, this should give you an idea of what GetRespose looks like.. it
might help you to understand what is going on (I'd also suggest getting
Reflector and having a look inside the BeginGetResponse method to see exactly
what is happening in there too):

public override WebResponse GetResponse()
{
if (this._HaveResponse)
{
this.CheckFinalStatus();
if (this._HttpResponse != null)
{
return this._HttpResponse;
}
}
IAsyncResult result1 = this.BeginGetResponse(null, null);
if ((this._Timeout != -1) && !result1.IsCompleted)
{
try
{
result1.AsyncWaitHandle.WaitOne(this._Timeout, false);
if (!result1.IsCompleted)
{
this.Abort();
throw new WebException(SR.GetString("net_timeout"),
WebExceptionStatus.Timeout);
}
}
catch (Exception)
{
this.Abort();
throw;
}
}
return this.EndGetResponse(result1);
}
 
Hi Brian,

Thanks for your response. With regards to the issue using the .NET
ThreadPool, in my application Im using a custom thread pool, specifically
Stephen Toubs ManagedThreadPool class

http://www.gotdotnet.com/Community/...mpleGuid=bf59c98e-d708-4f8e-9795-8bae1825c3b6

I decided to go with this ThreadPool implementation as it appears to have
been used in good products like .Text already. Using this implementation, I
think I avoid the issue of running out of threads as (from inspection of the
ManagedThreadPool code) it appears that the ManagedThreadPool makes no
reference to the ThreadPool class.

The problem with setting a lower timeout is that in my application if I do
that, if writing the response stream takes too long and causes the currently
executing thread to timeout, then all the other waiting threads timeout as
well.

Im going to lower the number of threads that Im using anyway to see if there
is an improvement. Would I be right in assumming that multiple instances of
my application running concurrently will not conflict with each other as they
will be running under their own processes?

Thanks,

Mark
 
Regardless of what threadpool you use the HttpWebRequest class does all it's
operations internally using the .NET threadpool. Even the synchronous
HttpWebRequests are spun out onto a seperate thread from the .NET ThreadPool
and then the thread waits for a response... basically the synchronous call
internally calls BeginGetResponse, waits for EndGetResponse, and then returns.

After re-reading your original response I'm surprised to hear that the
synchronous call the GetResponse is blocking the rest of the threads in your
threadpool. Very unusual.

Multiple instances won't interfere with each other.
 
Hi Brian,

Thanks for the response.

In looking into my problem further I do find that I have occasional
problems, when reading the response stream. I find that when calling
Stream.Read() this sometimes times out with the following exception:

Message: The operation has timed out
Stack Trace: at System.ConnectStream.Read(...)

Now it doesnt matter if I set HttpRequest.Timeout = 10000 or 100000, I still
occasionally get these timeouts. When I debug the application its like the
thread is in the process of reading bytes part of the way through and then
hangs.

I can provide code if requested, but I was wondering if anyone else has seen
symptoms like this, and if they could suggest a remedy.

Ive read on other newsgroups that my problem could be related to
ServicePoints or the ServicePoint manager, but as I have never worked with
these classes before, Im not too sure. If anyone could point me to some
useful resources, or explain it to me, it would be greatly appreciated.

Thanks,

Mark
 
Back
Top