Confusing Asynchronous Threadpool

M

Maya Sam

Hi all,

I have the following code I created to do multiple websites data crawling
using Asynchronous Thread calling it works fine however I'm confused when it
comes to make the calling thread stops or sleep until all other threads
within the threadpool have all finished their jobs.

what happens at the moment is that every other thread "should" just
concatenate string data to public string variable when it finishes, but
however the code will just go and execute the rest of the calling method
without all thread being completed.

I would be grateful if you could help me out with a solution on how to force
the code to wait for all methods to finish, here is the code, thank you:


//The calling method is in a different class, different file, it calls
ScanSites() method below passing an array of multiple web addresses
namespace Crawler

{

class Bot

{

public static string res = "";


public Bot()

{

}

public void ScanSites(string str)

{

// for each URL in the collection...


for (x = 0; x < URLs.Length- 1; x++)

{

WebRequest request = HttpWebRequest.Create(URLs[x]);

// RequestState is a custom class to pass info

RequestState state = new RequestState(request, URLs[x]);

IAsyncResult result = request.BeginGetResponse(new
AsyncCallback(UpdateItem), state);

ThreadPool.RegisterWaitForSingleObject(result.AsyncWaitHandle, new
WaitOrTimerCallback(ScanTimeoutCallback), state, (30 * 1000), true);


}

}

private void UpdateItem(IAsyncResult result)

{

// grab the custom state object

RequestState state = (RequestState)result.AsyncState;

WebRequest request = (WebRequest)state.Request;

// get the Response

HttpWebResponse response =

(HttpWebResponse)request.EndGetResponse(result);

StringBuilder sb = new StringBuilder(4096);

byte[] buf = new byte[2048];

int count;

while ((count = response.GetResponseStream().Read(buf, 0, buf.Length - 8)) >
0)

{

sb.Append(Encoding.UTF8.GetString(buf, 0, count));

}



StreamWriter sw = new StreamWriter("C:\\Rep\\" + state.URL);

sw.Write(sb);

sw.Close();

//here is the "res" variable that will hold the URL address concatenated |
and followed by another URL retrieved from another thread

res += state.URL+ "|";




}

private static void ScanTimeoutCallback(object state, bool timedOut)

{

if (timedOut)

{

RequestState reqState = (RequestState)state;

if (reqState != null)

reqState.Request.Abort();

}

}

}

class RequestState

{

public WebRequest Request; // holds the request

public string URL;

// public object Data; // store any data in this



public RequestState(WebRequest request, string url)

{

this.Request = request;

this.URL= docid;

// this.Data = data;

}

}
 
J

Jon Shemitz

Maya said:
I would be grateful if you could help me out with a solution on how to force
the code to wait for all methods to finish, here is the code, thank you:

Wow, that's some seriously ill-formatted code. Worse, it's missing a
} or two, so it took a few minutes to get it to a readable state.

Anyhow, probably the easiest way to do what you want is to not use
an aysnch callback, but to 1) build two lists, of WebRequest-s and
IAsyncResult-s 2) use BeginGetResponse to start each thread, passing
null, null and then 3) use EndGetResponse to block until each has
returned.

public void ScanSites2(string str)
{
// Start each request
List<WebRequest> Requests = new List<WebRequest>(URL.Length);
List<IAsyncResult> Cookies = new List<IAsyncResult>(URLs.Length);
foreach (string URL in URLS)
{
WebRequest request = HttpWebRequest.Create(URL);
Requests.Add(request);
Cookies.Add(request.BeginGetResponse(null, null));
}

// Block until all return
for (int Index = 0; Index < URLS.Length; Index++)
UpdateItem2(Requests[Index].EndGetResponse(Cookies[Index]);
}

private void UpdateItem2(HttpWebResponse response)
{
// It would be much easier to use response.GetResponseStream().ReadToEnd()
StringBuilder sb = new StringBuilder(4096);

byte[] buf = new byte[2048];

int count;

while ((count = response.GetResponseStream().Read(buf, 0, buf.Length - 8)) > 0)
{
sb.Append(Encoding.UTF8.GetString(buf, 0, count));
}

StreamWriter sw = new StreamWriter("C:\\Rep\\" + state.URL);
sw.Write(sb);
sw.Close();

//here is the "res" variable that will hold the URL address concatenated |
//and followed by another URL retrieved from another thread
res += state.URL + "|";
}


--

..NET 2.0 for Delphi Programmers <http://www.midnightbeach.com/.net>

Delphi skills make .NET easy to learn
Being printed - in stores by June
 
M

Maya Sam

Hi Jon,

Apologies for the misforamtting, I didn't think of the fact that you might
copy the code to try it at your side.

I have tried your code and it worked in a very nice way, many thanks for
that, one last question (I promise!), is there a way to catch the thread
getresponse exceptions? for example I get alot of timeout error messages
from sites I cant reach, the code doesn't do anything when a timeout occurs
in the thread and it stays forever without breaking. any ideas how to get
around this issue?

Many thanks again for your help.

Maya.

Jon Shemitz said:
Maya said:
I would be grateful if you could help me out with a solution on how to
force
the code to wait for all methods to finish, here is the code, thank you:

Wow, that's some seriously ill-formatted code. Worse, it's missing a
} or two, so it took a few minutes to get it to a readable state.

Anyhow, probably the easiest way to do what you want is to not use
an aysnch callback, but to 1) build two lists, of WebRequest-s and
IAsyncResult-s 2) use BeginGetResponse to start each thread, passing
null, null and then 3) use EndGetResponse to block until each has
returned.

public void ScanSites2(string str)
{
// Start each request
List<WebRequest> Requests = new List<WebRequest>(URL.Length);
List<IAsyncResult> Cookies = new List<IAsyncResult>(URLs.Length);
foreach (string URL in URLS)
{
WebRequest request = HttpWebRequest.Create(URL);
Requests.Add(request);
Cookies.Add(request.BeginGetResponse(null, null));
}

// Block until all return
for (int Index = 0; Index < URLS.Length; Index++)
UpdateItem2(Requests[Index].EndGetResponse(Cookies[Index]);
}

private void UpdateItem2(HttpWebResponse response)
{
// It would be much easier to use
response.GetResponseStream().ReadToEnd()
StringBuilder sb = new StringBuilder(4096);

byte[] buf = new byte[2048];

int count;

while ((count = response.GetResponseStream().Read(buf, 0, buf.Length -
8)) > 0)
{
sb.Append(Encoding.UTF8.GetString(buf, 0, count));
}

StreamWriter sw = new StreamWriter("C:\\Rep\\" + state.URL);
sw.Write(sb);
sw.Close();

//here is the "res" variable that will hold the URL address
concatenated |
//and followed by another URL retrieved from another thread
res += state.URL + "|";
}


--

.NET 2.0 for Delphi Programmers <http://www.midnightbeach.com/.net>

Delphi skills make .NET easy to learn
Being printed - in stores by June
 
J

Jon Shemitz

Maya said:
Apologies for the misforamtting, I didn't think of the fact that you might
copy the code to try it at your side.

Well, it was less a matter of trying it than trying to read it.
I have tried your code and it worked in a very nice way, many thanks for
that, one last question (I promise!), is there a way to catch the thread
getresponse exceptions? for example I get alot of timeout error messages
from sites I cant reach, the code doesn't do anything when a timeout occurs
in the thread and it stays forever without breaking. any ideas how to get
around this issue?

Setting the WebRequest Timeout property should help. (Off the top of
my head, I don't know whether you'll get a WebException from
EndGetResponse or GetResponseStream.)

An alternative is to create a delegate to a method that takes a URL
and returns its contents (or null). This method would do a synchronous
GetResponse, which might make the error handling a bit more
straightforward. See, for example,
<http://www.devsource.com/article2/0,1895,1966478,00.asp>.


--

..NET 2.0 for Delphi Programmers <http://www.midnightbeach.com/.net>

Delphi skills make .NET easy to learn
Being printed - in stores by June
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top