Multithreading WebRequests, a good and stable approach?

Nightcrawler · Dec 7, 2007

I have a webservice that gets data from three websites and puts the
result into a datatable and returns that datatable.

Currently the webservice makes a WebRequest (using parameters in a
querystring) to the first website, adds the data into a datatable then
moves on to the second website, merges the datatable together and
finally gets data from the third website and merges that datatable
with the existing one.

This works fine but I recently changed the client interface to pull
the information using AJAX/javascript. A browser like Firefox will
fire an error (script has stopped responding) if the javascript does
not respond in 10 seconds. This puts more preassure on the webservice
to execute and return results within those 10 seconds.

I started looking into multithreading these webRequests and fire the
requests at the same time.

My questions are:

1. Is this a good approach? Are there any risks in multithreading
multiple webRequests like this?
2. Can anyone point me in the right direction as to how to make these
webrequests using multithreading?
3. More importantly, how do I merge the data into one datatable once
all three webRequests are completed?

Any feedback is appreciated.

Thanks

Peter Duniho · Dec 7, 2007

[...]
My questions are:

1. Is this a good approach? Are there any risks in multithreading
multiple webRequests like this?

Multithreading always has risks. But I don't think there is anything
unusual in this scenario. Just the usual concurrency issues.

2. Can anyone point me in the right direction as to how to make these
webrequests using multithreading?

Use the async methods (start with "Begin..." and "End..."). This will
prevent any thread from being committed to the operation until it actually
completes. Or, it should anyway on operating systems that support IOCP.

3. More importantly, how do I merge the data into one datatable once
all three webRequests are completed?

Where's the datatable? What part of the issue are you having problems
with?

It sounds as though you're implementing some sort of web components
application, where there's server-side and client-side parts. Is the
question about how to get the data back to the client when everything's
done? Or is it simply about how to manage your DataTable object?

If the latter, it should be pretty much the same as however you do it
synchronously, except you'll have to provide synchronization for the
DataTable object (using the lock() statement, for example). If you need
the data in the DataTable object in some specific order (for example, in
the order the requests were started), you'll need to impose that order.
How best to do that depends on how you've defined the order.

If the former, I have no idea. Sounds like a web applications question,
and I don't know anything about that.

Pete

Nightcrawler · Dec 7, 2007

Pete,

This is the way I have it setup. I only included enough code so you
understand the logic. I stripped out a bunch that is not required. So
the part where I need to optimze is the GetSearchResultsArray(). I
would like to fire the three GetResults at the same time and then be
able to merge the data together into one table (no particular order).

Thanks for your help!

[WebMethod]
[System.Web.Script.Services.ScriptMethod(UseHttpGet = true)]
public result[] GetSearchResultsArray()
{
DataTable dt = BuildDataTable();
// The BuildDataTable() just returns a datatable with specific
columns to this operation (code not included for simplicity)

dt = GetResults("url1", "parameters1");
dt.Merge(GetResults("url2", "parameters2"));
dt.Merge(GetResults("url3", "parameters3"));

//Takes the datatable and converts it to a List (code not
included for simplicity)
}

private DataTable GetResults(string url, string parameters)
{
string result = GetSearchResults(url, parameters);

// Does processing of the result in the response string and
puts it into a prebuilt datatable (code not included for simplicity)
return DataTable;
}
private string GetSearchResults(string url, string parameters)
{
string httpRequest = String.Format("{0}?{1}", url,
parameters);

WebRequest webRequest = WebRequest.Create(httpRequest);
StreamReader responseReader = new
StreamReader(webRequest.GetResponse().GetResponseStream());

string responseString =
HttpUtility.UrlDecode(responseReader.ReadToEnd());
responseReader.Close();

return responseString;
}

Nicholas Paldino [.NET/C# MVP] · Dec 7, 2007

Thomas,

This is a perfectly fine idea, but it will require a little work. The
HttpWebRequest/HttpWebResponse classes absolutely support making calls
asynchronously.

The simplest way would be to set up your three web requests
(HttpWebRequest) instances and then call BeginGetResponse on each of them in
succession, storing the IAsyncResult implementations.

Then, right after that, you would call EndGetResponse on the instances,
passing the IAsyncResponse implementations that correspond to the instances
that returned them on BeginGetResponse.

At this point, you would have your three results and you could insert
them all into the data table to be returned.

This works because you are basically going to take as long as the
longest request to get all three requests (assuming they are to different
websites, the HTTP specification has a note in it about how many concurrent
connections can be opened to a website at the same time, I believe) and your
successive calls to EndGetResponse will not hang if the call completes
before it is called.

However, you can improve on this, if you need to squeeze out more
performance. You could pass callback routines to the BeginGetResponse
methods, in which you would merge the results with your data set. You could
then, when they are all complete, indicate to the waiting main thread that
you are done (through an EventHandle of some kind). That would be a little
more complex, since you don't want to create an individual event handle for
each web request (since you are in a web server, I imagine you are going to
be calling this a lot).

Anonymous methods can help though. I would do this:

// This can have any inputs and outputs you like, I'm just using it as an
example, but it
// is basically the entry point for your web request.
public DataTable MyMethod()
{
// Create the three web requests.
HttpWebRequest wr1 = ...;
HttpWebRequest wr2 = ...;
HttpWebRequest wr3 = ...;

// This is the number of web requests that still have to complete.
int requestsToComplete = 3;

// The data table to return.
DataTable dt = ...;

// The event which will be called to indicate that processing is done.
using (ManualResetEvent event = new ManualResetEvent())
{
// The async callback which will process the data. You will need
// separate code for each if they have different routines to
// populate the data table.
AsyncCallback callback =
delegate(IAsyncResult ar)
{
// Get the request from the state.
HttpWebRequest request = ar.AsyncState as HttpWebRequest;

// Call EndGetResponse.
using (HttWebResponse response = (HttpWebResponse)
request.EndGetResponse(ar))
{
// Add to the data table here. This is the code
specific to the request.
// You have to synchronize access to the table as well.
lock (dt)
{
// Process the response here and add the rows you
need to.
}
}

// Decrement the count on the requests to complete. If it
is
// zero, then fire the event.
if (Interlocked.Decrement(ref requestsToComplete) == 0)
{
// Set the event.
event.Set();
}
};

// Begin the calls here.
wr1.BeginGetResponse(callback1, wr1);
wr2.BeginGetResponse(callback2, wr2);
wr3.BeginGetResponse(callback3, wr3);

// Wait on the event here.
event.WaitOne();

// At this point, the data table will be populated, so you can
return it.
return dt;
}
}

Nightcrawler · Dec 7, 2007

Nicholas,

Many thanks for your input.

Yes, you are right, this is a web environment so it will be called
alot. Also, yes, each call will have a different routine as to how to
work with the data so I will have to setup three different
AsyncCallback callbacks.

I will dive into this right away.

Thanks a bunch!

Peter Duniho · Dec 7, 2007

Pete,

This is the way I have it setup. I only included enough code so you
understand the logic. I stripped out a bunch that is not required. So
the part where I need to optimze is the GetSearchResultsArray(). I
would like to fire the three GetResults at the same time and then be
able to merge the data together into one table (no particular order).

Okay, the "no particular order" is helpful. If the order did matter, that
could be easily addressed, but it does make the code simpler to not have
to worry about it.

Let's start with the suggestions and code that Nicholas posted, since his
basic response is very useful.

Based on that response, I'd offer a couple of observations:

* First, the difference between his two suggestions -- calling
EndGetResponse() in sequence for each request, versus setting a waitable
event -- is not very great, at least not as he demonstrated it. In either
case, the code will simply stop before exiting the method that starts all
three requests, so they have the same effect.

Where setting the event handle might be useful is if you had some code
_somewhere else_ that would wait on it, in a different thread. For
example, let's say you ran the code he posted in the main thread in
response to something, but had a different thread sitting around waiting
to process completed data retrievals. Then that different thread could
use the waitable event as its signal to do more work. Of course, in that
scenario you wouldn't create the waitable event in the code that starts
the requests. It'd be stored somewhere more accessible so that the other
thread could already be waiting on it.

* Second, his sample provides a very good illustration of the
synchronization required for the DataTable. I like to follow Jon's advice
to not lock using the actual object, but rather to create a separate
"object" instance for use in locking. But otherwise, his sample shows
what I meant when I wrote of the need to address concurrency issues by
synchronizing access to the DataTable.

* Finally, I think Nicholas meant to just write "callback" instead of
"callback1", "callback2", and "callback3" when he calls BeginGetResponse().

Now, how would I adjust his sample to suit the description you've given
above?

I would get rid of the synchronization at the end of his method, as well
as the waitable event altogether. I would also, of course, create a new
object for locking the DataTable. Finally, without the waitable event,
instead I would just call whatever code you have that needs to be called
when all of the requests have completed.

So, taking Nicholas's code as the starting point, here's what it'd look
like instead:

public void MyMethod()
{
// Create the three web requests.
HttpWebRequest wr1 = ...;
HttpWebRequest wr2 = ...;
HttpWebRequest wr3 = ...;

// This is the number of web requests that still have to complete.
int requestsToComplete = 3;

// The data table to return.
DataTable dt = ...;

// [an object used to synchronize access to the DataTable -- Pete]
object objLock = new object();

// The event which will be called to indicate that processing is done.
// The async callback which will process the data. You will need
// separate code for each if they have different routines to
// populate the data table.
AsyncCallback callback =
delegate(IAsyncResult ar)
{
// Get the request from the state.
// [note that I've changed to a straight case from the "as"
// that Nicholas had. I only use "as" if I've got some code
// that will actually deal with a failed cast. Otherwise,
// you just get a delayed exception, and a less-useful oneat
// that, since the exception is a null reference instead of
the
// more informative invalid cast that actually describes what
// went wrong -- Pete]
HttpWebRequest request = (HttpWebRequest)ar.AsyncState;

// Call EndGetResponse.
using (HttWebResponse response = (HttpWebResponse)
request.EndGetResponse(ar))
{
// Add to the data table here. This is the code specific
to the request.
// You have to synchronize access to the table as well.
lock (objLock)
{
// Process the response here and add the rows you need
to.

// [here is where you'd convert the response to
DataTable and
// then call DataTable.Merge() with the results, for
example.
// Noting, of course, that in this scenario it might
be easier
// to just add the data as it's generated from the
response to
// the original table. But if that were really true,
maybe you
// would have done it that way in the original code
too, so I don't
// really know.

-- Pete]
}
}

// Decrement the count on the requests to complete. If it is
// zero, then fire the event.
if (Interlocked.Decrement(ref requestsToComplete) == 0)
{
// [here you'd call whatever method needs executing when
all of the
// data has been retrieved. If that method includes any
calls to update
// things in the UI, you'll either need to use
Control.Invoke() here to
// call that method, or in that method use
Control.Invoke() to do the
// UI-specific stuff -- Pete]
}
};

// Begin the calls here.
wr1.BeginGetResponse(callback, wr1);
wr2.BeginGetResponse(callback, wr2);
wr3.BeginGetResponse(callback, wr3);
}

Hope that helps.

Pete

Peter Duniho · Dec 7, 2007

Yes, you are right, this is a web environment so it will be called
alot. Also, yes, each call will have a different routine as to how to
work with the data so I will have to setup three different
AsyncCallback callbacks.

For the record, the previous code you posted illustrating what you're
doing uses the same method to process all three requests. This suggests
that you only need one callback method as well. If there are specific
parameters guiding each specific request, those can easily be incorporated
into the anonymous method (in fact, IMHO it can be easier when using an
anonymous method than if you had to pass them directly, as long as you
watch out for variable capturing).

Pete

Nightcrawler · Dec 7, 2007

Thank you both for your input. I greatly appreciate it.

I will test it out and see what kind of improvement I will be able to
get in my webservice requests in terms of time.

Thanks

Nightcrawler · Dec 12, 2007

Pete,

I am trying your code but it doesn't seem to work.

I tried Nicholas code and it worked fine. I then adjusted it to try
yours by removing the manualevent and modifying it to your post but
now it simply returns nothing. Almost as if the requests never
happened. I have a feeling I am missing a line of code that prevents
the method to exit out before the requests are done.

Please let me know.

Thanks

Peter Duniho · Dec 12, 2007

Pete,

I am trying your code but it doesn't seem to work.

I tried Nicholas code and it worked fine. I then adjusted it to try
yours by removing the manualevent and modifying it to your post but
now it simply returns nothing. Almost as if the requests never
happened. I have a feeling I am missing a line of code that prevents
the method to exit out before the requests are done.

Why do you want to prevent the method from exiting?

I thought the whole point here was that if your code doesn't return, it
appears unresponsive to the browser, which then cancels your code.

The code Nicholas posted may speed things a bit by parallelizing the
requests, but ultimately you're still waiting, and if any one request
takes too long, all of the requests are basically useless.

Presumably, you've got some other code that would be executed after the
method returns, taking all of the responses in aggregate and doing
something useful with them. In the code I posted, you should execute that
code where I indicated by my comments, once the counter reaches zero. You
may want to just put all that code into a method, and then call that
method where I've indicated.

I don't think there's any reason to prevent the method from exiting, but
if that's a requirement of yours for some reason then no, the code I
posted isn't going to work for you. I wrote it specifically to return as
soon as it could, rather than waiting around for the asynchronous i/o to
complete, since that's generally the point of doing asynchronous i/o (as
Nicholas points out, even initiating the i/o asynchronously is only going
to allow a limited number of the requests to actually proceed in parallel,
depending on the system configuration).

Pete

Nightcrawler · Dec 12, 2007

Pete,

The method will be called throught an AJAX user interface so it will
be exposed in a webservice.

So my current code has 3 different callbacks since each of them have
seperate routines specific to each web request. Once the datatable has
been populated through my three callback routines, I do some filtering
of the datatable using a dataview, then finally convert all the data
in the table to a List and return it as an array to the calling
javascript, which will display it to the user.

Are you saying I could return portions using your code. So, if
webrequest 1 is done it will return that to the javascript, then if
request 3 is done, it will return that and then finally request 2 (I
am assuming the finsih in that order).

Thanks for your input.

Nightcrawler · Dec 12, 2007

Pete,

On another note, how can I include a regular method call to a table
adapter. Say I want to fetch data from 3 webrequests and 1 one request
using a dataadapter and my own database. Could I incorporate that
logic into this as well?

So theoretically, four threads would work at the same time to populate
a datatable (3 webrequests and one dataadapter using sql server) then
returned through a webservice.

The reason I have to optimze these requests is simply because browsers
ike firefox will throw and disclaimer that the javascript stopped
working if the webservice call through javascript takes longer than 10
seconds. I want to avoid that at all costs.

Thanks

Peter Duniho · Dec 12, 2007

Pete,

The method will be called throught an AJAX user interface so it will
be exposed in a webservice.

So my current code has 3 different callbacks since each of them have
seperate routines specific to each web request. Once the datatable has
been populated through my three callback routines, I do some filtering
of the datatable using a dataview, then finally convert all the data
in the table to a List and return it as an array to the calling
javascript, which will display it to the user.

The basic theory is the same as the code that Nicholas and I posted. In
the case of my proposal, you can still use three different callbacks, as
long as each includes some logic as I've suggested at the end to detect
whether all of the requests have completed.

Are you saying I could return portions using your code. So, if
webrequest 1 is done it will return that to the javascript, then if
request 3 is done, it will return that and then finally request 2 (I
am assuming the finsih in that order).

I have no idea if that would work. It might, but I have no way of
knowing. For one, I don't do much web development, and I don't have any
idea how .NET interacts with the web client stuff. For another, I don't
know enough about your particular implementation and how that would work
with the web client to know whether returning intermediate results would
work.

What I do know is that assuming you currently have an implementation that
returns just the final complete results, and assuming there's some way for
that implementation to respond to the web client (with or without the
actual results) for it to not generate some kind of timeout error, then
there is a simple way (as illustrated in this thread) to asynchronously
accumulate the responses as well as know when all have completed so that
you can take some appropriate action.

Beyond that, you'll need someone who knows more about the web client
aspect of .NET. I know that I've dealt with web pages that takes FAR
longer than 20 seconds to return their results, both in terms of pages
that take that long to load as well as pages that appear to load right
away but then have some sort of deferred processing that updates something
in the page later. But I've never bothered to take a look at how those
are implemented. All I know is that it can be done.

Pete

Peter Duniho · Dec 12, 2007

On another note, how can I include a regular method call to a table
adapter. Say I want to fetch data from 3 webrequests and 1 one request
using a dataadapter and my own database. Could I incorporate that
logic into this as well?

Yes, but since DataAdapter doesn't have an async API, you'll have to
handle that yourself. The most straightforward way would be to use a
BackgroundWorker. The general idea is the same though: provide a delegate
(in this case, used as the handler for the BackgroundWorker.DoWork event)
that does the request and then at the end does the same "am I done with
all requests yet?" sort of logic that the other async handlers do.

In that case, BackgroundWorker.RunWorkerAsync() method takes the place of
the BeginGetResponse() method. You can either put the "am I done with all
requests yet?" logic at the end of the DoWork handler, or you can create a
seperate delegate to handle the BackgroundWorker.RunWorkerCompleted
event. In the latter case, the main advantage is that the event is raised
on the same thread that created the BackgroundWorker, but since you need
to do with thread synchronization issues anyway (for the other three
requests), this may not be all that useful in your case.

So theoretically, four threads would work at the same time to populate
a datatable (3 webrequests and one dataadapter using sql server) then
returned through a webservice.

The reason I have to optimze these requests is simply because browsers
ike firefox will throw and disclaimer that the javascript stopped
working if the webservice call through javascript takes longer than 10
seconds. I want to avoid that at all costs.

Well, as I mentioned in my other reply, I can't really comment very much
on the exact interaction with the browser. Whatever the time limit is (10
seconds, 20 seconds, etc.) it seems to me that any one request _could_
take longer than that, and so if you are waiting for them all to complete,
then even if they are all done in parallel you could still wind up hitting
that limit.

While I don't know how you'd implement this, I think it would be better
for the code that the browser is waiting on to return immediately, and
then provide some way to update the page later once the requests have all
completed (or, if possible, update the page as the intermediate results
complete as well).

With web browsers being mainly "pull" data models, I don't really know how
that sort of things would work. But I know I've seen what _seems_ to be
like a "push" data presentation in a web browser, so it seems like it
ought to be doable somehow.

Pete

Multithreading WebRequests, a good and stable approach?

Nightcrawler

Peter Duniho

Nightcrawler

Nicholas Paldino [.NET/C# MVP]

Nightcrawler

Peter Duniho

Peter Duniho

Nightcrawler

Nightcrawler

Peter Duniho

Nightcrawler

Nightcrawler

Peter Duniho

Peter Duniho