Empty HttpWebResponse length for some url (with queries?)

G

Guest

For some URLs
(e.g.http://v3.espacenet.com/origdoc?DB=EPODOC&IDX=WO2005028634&F=0&QPN=WO2005028634),
the content length for the HttpWebResponse I get with request.GetResponse in
empty. The response.GetResponseStream() also empty. However, I am able to
open the URL in the browser (the URL address remains the same; it is not
redirected)

Here is the code snippet:

HttpWebRequest req = (HttpWebRequest)WebRequest.Create(pageAddress);
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
StreamReader sr = new StreamReader(resp.GetResponseStream());
string pageData = sr.ReadToEnd();

The Content Type for the response is "text/html; charset=iso-8859-1" and the
HttpStatusCode was OK. The pageData length is 0.

Should the URL be modified for it to work (e.g. some substitution)?

What am I missing?

Thanks

Jason
 
O

octavio.filipe

Hi,

You are using this:

StreamReader sr = new StreamReader(resp.GetResponseStream());

Shouldn't you be using this:

StreamReader sr = new StreamReader(resp.GetResponse().GetResponseStream());

The rest seens fine to me!
Hope it helps...
o.f
 
M

Michael McCarthy

Is the length empty or is the object empty? Streams from webrequests by
default dont allow seeking so it won't know the length off hand.. if you
put a break in the code after it gets the stream you should be able to
hover over the stream and see if it's null or not, it shouldn't be null
unless it got an error, but it may not know the length...
 
G

Guest

The object is not empty. The length of pageData (created with
StreamReader.readToEnd()) is 0. And the resp.ContentLength is also 0. But it
does show info like ContentType (which is "text/html; charset=iso-8859-1").
The HttpStatusCode is also OK. This problem occurs for only this URL. Is
there a special requirement in the WebRequest for URLs with query strings in
it?
 
G

Guest

I also found in the response header info like:

Set-Cookie: JSESSIONID=0000AQWFK7gpN1TDbm0IKxJ94Tu:10dg77lr6;Path=/
 
J

Joerg Jooss

Jason said:
For some URLs
(e.g.http://v3.espacenet.com/origdoc?DB=EPODOC&IDX=WO2005028634&F=0&QP
N=WO2005028634), the content length for the HttpWebResponse I get
with request.GetResponse in empty. The response.GetResponseStream()
also empty. However, I am able to open the URL in the browser (the
URL address remains the same; it is not redirected)

Here is the code snippet:

HttpWebRequest req = (HttpWebRequest)WebRequest.Create(pageAddress);
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
StreamReader sr = new StreamReader(resp.GetResponseStream());
string pageData = sr.ReadToEnd();

The Content Type for the response is "text/html; charset=iso-8859-1"
and the HttpStatusCode was OK. The pageData length is 0.

Should the URL be modified for it to work (e.g. some substitution)?

What am I missing?

The character encoding for example. You're using a UTF-8 StreamReader
to read ISO-8859-1 which will cause problems. Create a ISO-8859-1
StreamReader to decode the response properly:

Encoding enc = Encoding.GetEncoding(28591);
StreamReader reader = new StreamReader(reponseStream, enc);

But you're right about the Content-Length: The server sends
"Content-Length: 0", which is a disastrous error. It is quite possible
that this breaks HttpWebResponse.

Cheers,
 
F

Feroze [msft]

I looked at this in some detail. I think the server is looking for a
User-Agent header in the request. If it finds one, it sends a response page.
Otherwise it doesnt.

You can take a look at the attached program. Set the proxy if needed. Run
the program with the UserAgent set. You will see the page getting
downloaded. Then rerun it with that line commented out. And the response
will have Zero content-length.




using System;
using System.IO;
using System.Text;
using System.Net;
using System.Net.Sockets;

public class EP {
public static void Main(string [] args)
{
string us =
"http://v3.espacenet.com/origdoc?DB=EPODOC&IDX=WO2005028634&F=0&QPN=WO2005028634";
HttpWebRequest req = WebRequest.Create(us) as HttpWebRequest;
req.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.7.6)
Gecko/20";
//req.Proxy = new WebProxy("http://my-proxy");

try {
HttpWebResponse resp = req.GetResponse() as HttpWebResponse;

foreach(string h in resp.Headers) {
Console.WriteLine(h + ": " + resp.Headers[h]);
}

Stream rs = resp.GetResponseStream();
MemoryStream ms = new MemoryStream();

byte [] data = new byte[1024];
int read = rs.Read(data,0,data.Length);
while(read > 0) {
ms.Write(data,0,read);
read = rs.Read(data,0,data.Length);
}

resp.Close();

ms.Seek(0,SeekOrigin.Begin);

StreamReader sr = new StreamReader(ms);
String s = sr.ReadToEnd();
sr.Close();

Console.WriteLine("====");
Console.WriteLine(s);

} catch(Exception e) {
Console.WriteLine(e);
}
}
}
 
G

Guest

Realized (by experimenting) that setting the UserAgent in HttpWebRequest
solved the problem. But, hey, I could access most of websites without setting
it.

req.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Win32)";
resp = (HttpWebResponse)req.GetResponse();
 
M

Michael McCarthy

Joerg said:
But you're right about the Content-Length: The server sends
"Content-Length: 0", which is a disastrous error. It is quite possible
that this breaks HttpWebResponse.

But the content length would only be a specific length when using keep
alive with http 1.1+... which we can assume, using .NET that we don't
care about, since we can use a common connection pool that deprecates
keep alive.

think so?

~Michael.
 
J

Joerg Jooss

Michael said:
But the content length would only be a specific length when using
keep alive with http 1.1+... which we can assume, using .NET that we
don't care about, since we can use a common connection pool that
deprecates keep alive.

think so?

In HTTP 1.1 you either

a) specify a Content-Length
b) use Transfer-Encoding: chunked
c) use Connection: close

using c) as request header disables the use of persistent connections
("keep-alive" is an outdated HTTP 1.0+ term).

I don't get the part with the connecion pooling. A connection pool only
makes sense to, um, pool connections to a well known end communication
end points, e.g. a database or an EIS. HTTP clients in general do
communicate with a lot of *different* end points, so what exactly do
want to pool? And regardless of pooling, "Connection: close" must
terminate the TCP connection:

"Once a close has been signaled, the client MUST NOT send any more
requests on that connection." (§8.1.2)

Cheers,
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top