Why WebHttpRequest.GetResponse() stuck?

M

Morgan Cheng

I happens to surf to
http://www.codeproject.com/cs/internet/Crawler.asp, which claims that
WebRequest.GetResponse() will block other thread calling this function
until WebResponse.Close() is called.

I did some experimentation.

public static void Main(string[] args)
{
for (int idx=0; idx<10; ++idx)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(testWeb), idx);
}
}

private static void testWeb(object idx)
{
string uri = "http://www.gmail.com";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
Console("in thread " + idx);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Console.WriteLine( response.ContentType + "; idx = " + (int)flag );
// response.Close();
}

The code runs with output like below:
in thread 0
in thread 1
text/html; charset=UTF-8; idx=0
text/html; charset=UTF-8; idx=1
in thread 2
in thread 3
in thread 4
in thread 5
in thread 6
in thread 7
in thread 8
in thread 9


"idx" may be other value, but only 2 threads get through GetRespnse()
all the time. It seems other 18 threads are stuck at
HttpWebRequest.GetResponse().

After I un-comment the line " response.Close()", it prints expected 20
lines. There must something occupied by HttpWebResonse before it is
closed.

Does HttpWebResonse instance occupy some resouce which there is only 2
availabe instance? If this is the case, it is really a issue for
application needs many WebResponse instance. e.g. web crawler.
 
J

Jesse Houwing

Morgan said:
I happens to surf to
http://www.codeproject.com/cs/internet/Crawler.asp, which claims that
WebRequest.GetResponse() will block other thread calling this function
until WebResponse.Close() is called.

I did some experimentation.

public static void Main(string[] args)
{
for (int idx=0; idx<10; ++idx)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(testWeb), idx);
}
}

private static void testWeb(object idx)
{
string uri = "http://www.gmail.com";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
Console("in thread " + idx);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Console.WriteLine( response.ContentType + "; idx = " + (int)flag );
// response.Close();
}

The code runs with output like below:
in thread 0
in thread 1
text/html; charset=UTF-8; idx=0
text/html; charset=UTF-8; idx=1
in thread 2
in thread 3
in thread 4
in thread 5
in thread 6
in thread 7
in thread 8
in thread 9


"idx" may be other value, but only 2 threads get through GetRespnse()
all the time. It seems other 18 threads are stuck at
HttpWebRequest.GetResponse().

After I un-comment the line " response.Close()", it prints expected 20
lines. There must something occupied by HttpWebResonse before it is
closed.

Does HttpWebResonse instance occupy some resouce which there is only 2
availabe instance? If this is the case, it is really a issue for
application needs many WebResponse instance. e.g. web crawler.


The response stream is left open for you to examine the data returned b
the webresponse, but it's actually only ever downloaded should you need
it (to prevent unneeded data transfers and to make sure you get the
response within a reasonable amount of time).

I'm not sure why the number is two, but it is good practise to keep the
number of concurrent connections you open to one site to a minimum, so
that you don't overload the site in question. The WebClient
automatically makes sure you don't open too many connections.

By the way, you shouldn't just call Close after you've gotten the
webrespone. If anything happens in between the connection is likely to
remain open for some time, which is not what you would want. To make
sure it is closed in time add a using statement:

using System.Threading;
using System.IO;
using System.Net;
using System;

public class TestConsoleApp
{
public static void Main(string[] args)
{
for (int idx = 0; idx < 10; ++idx)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(testWeb), idx);
}
Console.ReadLine();
}

private static void testWeb(object idx)
{
string uri = "http://www.gmail.com";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
request.KeepAlive = false;
Console.WriteLine("in thread " + idx);
using (HttpWebResponse response =
(HttpWebResponse)request.GetResponse())
{
Console.WriteLine(response.ContentType + "; idx = " +
(int)idx);
}
}
}

The WebResponse is then automatically closed once it goes out of scope.

You can see that this only happens if you try to open many connections
to the same website. I've altered your test to show this:

using System.Threading;
using System.IO;
using System.Net;
using System;

public class TestConsoleApp
{
private static string[] _urls = new string[]
{
"http://www.gmail.com",
"http://www.google.com",
"http://www.google.co.uk",
"http://www.google.nl",
"http://www.google.ie",
"http://www.google.de",
"http://www.amazon.com",
"http://www.microsoft.com",
"http://www.tweakers.net",
"http://www.cnn.com"
};

private static string[] _urlsSame = new string[]
{
"http://www.gmail.com",
"http://www.gmail.com",
"http://www.gmail.com",
"http://www.cnn.com",
"http://www.cnn.com",
"http://www.cnn.com",
"http://www.cnn.com",
"http://www.google.com",
"http://www.google.com",
"http://www.google.com"
};

public static void Main(string[] args)
{
Console.WriteLine("Test A");
for (int idx = 0; idx < 10; ++idx)
{
ThreadPool.QueueUserWorkItem(new
WaitCallback(testWebWorking), _urls[idx]);
}

Console.ReadLine();

Console.WriteLine("Test B");
for (int idx = 0; idx < 10; ++idx)
{
ThreadPool.QueueUserWorkItem(new
WaitCallback(testWebFaulty), _urls[idx]);
}

Console.ReadLine();

Console.WriteLine("Test B");
for (int idx = 0; idx < 10; ++idx)
{
ThreadPool.QueueUserWorkItem(new
WaitCallback(testWebWorking), _urlsSame[idx]);
}

Console.ReadLine();
}

private static void testWebWorking(object url)
{
string uri = (string)url;
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
request.KeepAlive = false;
Console.WriteLine("opening: " + uri);
using (HttpWebResponse response =
(HttpWebResponse)request.GetResponse())
{
Console.WriteLine(response.ContentType + "; uri = " + uri);
}
}

private static void testWebFaulty(object url)
{
string uri = (string)url;
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
request.KeepAlive = false;
Console.WriteLine("opening: " + uri);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Console.WriteLine(response.ContentType + "; uri = " + uri);
}
}

test A works regardless of which uri you feed it.
test B only works if there are not too many connections to the same
server (first test B will succeed, second test will fail).

Jesse Houwing
 
S

Steven Nagy

Isn't there some OS rule that prevents you from opening more than 2
connections to a domain at once? I'm sure there is. How that directly
affects your code, I'm not sure.
 
M

Morgan Cheng

Jesse said:
Morgan said:
I happens to surf to
http://www.codeproject.com/cs/internet/Crawler.asp, which claims that
WebRequest.GetResponse() will block other thread calling this function
until WebResponse.Close() is called.

I did some experimentation.

public static void Main(string[] args)
{
for (int idx=0; idx<10; ++idx)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(testWeb), idx);
}
}

private static void testWeb(object idx)
{
string uri = "http://www.gmail.com";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
Console("in thread " + idx);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Console.WriteLine( response.ContentType + "; idx = " + (int)flag );
// response.Close();
}

The code runs with output like below:
in thread 0
in thread 1
text/html; charset=UTF-8; idx=0
text/html; charset=UTF-8; idx=1
in thread 2
in thread 3
in thread 4
in thread 5
in thread 6
in thread 7
in thread 8
in thread 9


"idx" may be other value, but only 2 threads get through GetRespnse()
all the time. It seems other 18 threads are stuck at
HttpWebRequest.GetResponse().

After I un-comment the line " response.Close()", it prints expected 20
lines. There must something occupied by HttpWebResonse before it is
closed.

Does HttpWebResonse instance occupy some resouce which there is only 2
availabe instance? If this is the case, it is really a issue for
application needs many WebResponse instance. e.g. web crawler.


The response stream is left open for you to examine the data returned b
the webresponse, but it's actually only ever downloaded should you need
it (to prevent unneeded data transfers and to make sure you get the
response within a reasonable amount of time).
Do you mean that HttpWebRequest.GetResponse() doesn't download the uri
resouce to local machine? I tried to fetch some big resouce. The
GetResponse() takes time, whilte response.GetResponseStream() returns
immediately. I belive that downloading happends at GetResponse().

I'm not sure why the number is two, but it is good practise to keep the
number of concurrent connections you open to one site to a minimum, so
that you don't overload the site in question. The WebClient
automatically makes sure you don't open too many connections.

I checked Http/1.1 protocol. In RFC 2616 section 8.1.4, it reads,
Clients that use persistent connections SHOULD limit the number of
simultaneous connections that they maintain to a given server. A
single-user client SHOULD NOT maintain more than 2 connections with
any server or proxy. A proxy SHOULD use up to 2*N connections to
another server or proxy, where N is the number of simultaneously
active users. These guidelines are intended to improve HTTP response
times and avoid congestion.

I believe that is why .net framework limit connectioin to one host no
more than 2.
By the way, you shouldn't just call Close after you've gotten the
webrespone. If anything happens in between the connection is likely to
remain open for some time, which is not what you would want. To make
sure it is closed in time add a using statement:

using System.Threading;
using System.IO;
using System.Net;
using System;

public class TestConsoleApp
{
public static void Main(string[] args)
{
for (int idx = 0; idx < 10; ++idx)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(testWeb), idx);
}
Console.ReadLine();
}

private static void testWeb(object idx)
{
string uri = "http://www.gmail.com";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
request.KeepAlive = false;
Console.WriteLine("in thread " + idx);
using (HttpWebResponse response =
(HttpWebResponse)request.GetResponse())
{
Console.WriteLine(response.ContentType + "; idx = " +
(int)idx);
}
}
}

The WebResponse is then automatically closed once it goes out of scope.
That's cool. Thanks.
But, how does CLR get to know that response.Close() should be called
when out of the scope? Does CLR always call **.Close() for keyword
using?
You can see that this only happens if you try to open many connections
to the same website. I've altered your test to show this:

using System.Threading;
using System.IO;
using System.Net;
using System;

public class TestConsoleApp
{
private static string[] _urls = new string[]
{
"http://www.gmail.com",
"http://www.google.com",
"http://www.google.co.uk",
"http://www.google.nl",
"http://www.google.ie",
"http://www.google.de",
"http://www.amazon.com",
"http://www.microsoft.com",
"http://www.tweakers.net",
"http://www.cnn.com"
};

private static string[] _urlsSame = new string[]
{
"http://www.gmail.com",
"http://www.gmail.com",
"http://www.gmail.com",
"http://www.cnn.com",
"http://www.cnn.com",
"http://www.cnn.com",
"http://www.cnn.com",
"http://www.google.com",
"http://www.google.com",
"http://www.google.com"
};

public static void Main(string[] args)
{
Console.WriteLine("Test A");
for (int idx = 0; idx < 10; ++idx)
{
ThreadPool.QueueUserWorkItem(new
WaitCallback(testWebWorking), _urls[idx]);
}

Console.ReadLine();

Console.WriteLine("Test B");
for (int idx = 0; idx < 10; ++idx)
{
ThreadPool.QueueUserWorkItem(new
WaitCallback(testWebFaulty), _urls[idx]);
}

Console.ReadLine();

Console.WriteLine("Test B");
for (int idx = 0; idx < 10; ++idx)
{
ThreadPool.QueueUserWorkItem(new
WaitCallback(testWebWorking), _urlsSame[idx]);
}

Console.ReadLine();
}

private static void testWebWorking(object url)
{
string uri = (string)url;
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
request.KeepAlive = false;
Console.WriteLine("opening: " + uri);
using (HttpWebResponse response =
(HttpWebResponse)request.GetResponse())
{
Console.WriteLine(response.ContentType + "; uri = " + uri);
}
}

private static void testWebFaulty(object url)
{
string uri = (string)url;
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
request.KeepAlive = false;
Console.WriteLine("opening: " + uri);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Console.WriteLine(response.ContentType + "; uri = " + uri);
}
}

test A works regardless of which uri you feed it.
test B only works if there are not too many connections to the same
server (first test B will succeed, second test will fail).

Jesse Houwing
 
M

Morgan Cheng

Steven said:
Perhaps that property is only for managing incoming connections and
restricting the number of connections your app will accept?
I believe the property is for outgoing connection.
In my understanding of this article, the connection setting is as
client side.
"server. For simulating multiple clients sending simultaneous requests
to the remote object, we changed the default of 2 to 100 connections to
the server per client using the client's configuration".
If the remote site is not configured to accept more than 2 connections
per client, then I can't see how you can get around this problem.

Yes, web server COULD limit connection from same IP address, but
normally they won't do that. Since many IP address are belind a proxy,
if limit connection from one IP address(in this case, from one proxy),
it impacts all client behind the proxy.

Actually, I believe that webmaster prefer more web access:)
Since almost all clients has such 2-connection-to-one-host limit,
server might want to break it. I heared from someone that Yahoo! has
some techniques to trick browser to access its host with more than 2
connection. Not clear how Yahoo! make it.
 
S

Steven Nagy

Not sure if this is relevant, but I remember something at Tech Ed about
Virtual Earth being spread across multiple domains so that IE could
open many connections at once to get the data back faster, as opposed
to just one domain restricting to 2 connections.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top