HttpRequest Question

Anthony Sullivan · Mar 28, 2007

I'm working on a quick and dirty little app that will go out to a website
and scrape an xml response.

So far its working fine with one exception. The xml response has an xsl
stylesheet tag at the top so that when a browser hits it the xml is
transformed. When I use the code below I get the transformed version of the
page rather than the raw xml. Can anyone tell me how to get the raw xml
response that I see when I 'view source' in IE?

Here is the code:

HttpWebRequest oRequest = (HttpWebRequest)
WebRequest.Create("Http://www.randomwebsite.com/somexmlfile.xml");
HttpWebResponse oResponse = (HttpWebResponse)
oRequest.GetResponse();
Stream oStream = oResponse.GetResponseStream();
StreamReader oReader = new StreamReader(oStream);
Response.Write(oReader.ReadToEnd());
oResponse.Close();

Thanks!

Anthony

Ben Rush · Mar 28, 2007

what happens if you use System.Net.WebClient instead? I'm not sure if this
will fix your problem or not....

System.Net.WebClient web = new System.Net.WebClient();

String data =web.DownloadString(pageURL);

Anthony Sullivan · Mar 28, 2007

Same thing unfortunately.

Any other ideas?

Ben Rush said:
what happens if you use System.Net.WebClient instead? I'm not sure if this
will fix your problem or not....

System.Net.WebClient web = new System.Net.WebClient();

String data =web.DownloadString(pageURL);

Patrice · Mar 28, 2007

Do yo meant you used "view source" and you see transformed content ? Else
you'll have to suppress the stylesheet line. For now IMO you should get the
XML document but as you have still the stylesheet reference embedded in it,
the browser to which you stream the result will still use the stylesheet.

Else I'll give this a try but I doubt WebRequest would automatically handle
this. Another option could be that the server itself does something based
for example on the user agent to render already transformed data if it looks
like the user agent doesn't support stylesheets...

Anthony Sullivan · Mar 28, 2007

When I browse in IE to the site and hit 'View Source' I see the raw un
transformed xml.

When the browser processes the file it downloads the xsl and transforms it
for viewing.

What I'd like to do is grab the raw xml source. Its obviously available
somewhere of I wouldn't see it when I 'view source'.

Any ideas?

Patrice · Mar 28, 2007

Just to confirm the behavior I assumed. That is when reading an XML document
such as http://msdn.microsoft.com/globalrss/en-us/global-msdn-en-us.xml, I
still get the unchanged XML document.

So :
- either you mistakenly thought the document was transformed because you saw
it in a browser(in which case it shouldn't be a problem)
- either the server could render an already transformed document if the user
agent is thought to be unaware of XML/XSL (in which case you could try to
provide the missing user agent information).

Patrice · Mar 28, 2007

See my confirmation at this same thread level.

For now it looks like to me you make a confusion between what you see in the
browser (that is the result of the XSL transformation) and what you really
have (that is the original XML file). Obviously if you stream the response
you get from the server to a browser, the browser will still process the
file and display it as if it were transformed. If your program process the
result, it will see though this content as the raw XML document and not
something transformed.

What do you want to do on this XML file ? If you want to process this in
your code, you IMO already have the "raw" XML file. If you want to just
display it as an untransformed file in the browser you could just suppress
the stylesheet tag.

Anthony Sullivan · Mar 28, 2007

I'm sorry,

I'm probably not being completely clear.

When I hit view source I see XML. However, If use the code I posted in my
original post and I save the response stream to a file it isn't the XML but
rather the transformed HTML. Even if I open it in notpade which removes the
browser element from the equation.

I wish I knwe a way to be more transparent but I don't.

Anthony

Anthony Sullivan · Mar 28, 2007

Anyone else have any ideas?

Latish Sehgal · Mar 29, 2007

Anthony
Could you give a sample page where you are able to see the xml source?
I tried the code with the url Patrice cited above, and it seems to
work.

string URL = "http://msdn.microsoft.com/globalrss/en-us/global-
msdn-en-us.xml";
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(URL);

//Get the data as an HttpWebResponse object
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();

//Convert the data into a string (assumes that you are
requesting text)
StreamReader sr = new StreamReader(resp.GetResponseStream());

String results = sr.ReadToEnd();
sr.Close();

Patrice · Mar 29, 2007

IMO it left us with the second option I talked about in a previous post as
you are positively sure you get a transformed document from the server. That
is if the server receives a request without a user agent string it could
then perform the transformation server side to support non XML/XSL browsers.

Have you tried my suggestion to add a valid user agent string to your
request ?

Anthony Sullivan · Mar 29, 2007

Hi Latish,

The code that Patrice provided seems to be doing the same thing. I've
uploaded a test page for you guys to look at. Here is the link. also
included below is the code for that page. It's not terribly complex. What I
believe to be happening is that the HttpWebRequest Object is parsing the XML
much like a browser would. I don't believe that that translated result is
what the server is delivering.

If you browse out to the link that the code is requesting and "view source"
you'll see the raw xml, however if you look at what my code is outputting
you can see that you get the transformed document instead.

Here is the link to my page.
http://armory.bobguild.net/default.aspx

Here is the link that the page is trying to request.
http://armory.worldofwarcraft.com/guild-info.xml?r=Duskwood&n=Band+òf+Brothers&p=1

Here is the codebehind my page.
string URL =
"http://armory.worldofwarcraft.com/guild-info.xml?r=Duskwood&n=Band+òf+Brothers&p=1";
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(URL);

//Get the data as an HttpWebResponse object
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();

//Convert the data into a string (assumes that you are requesting
text)
StreamReader sr = new StreamReader(resp.GetResponseStream());

String results = sr.ReadToEnd();
sr.Close();

Response.Write(results);

Thanks!

Anthony Sullivan

Anthony Sullivan · Mar 29, 2007

See my response to Latish Seghal further down the thread.

Thanks!

Latish Sehgal · Mar 29, 2007

Patrice had the correct solution.
Add the following line

HttpWebRequest req = (HttpWebRequest)WebRequest.Create(URL);
req.UserAgent = @"Mozilla/4.0 (compatible; MSIE 6.0; Windows
NT 5.1)";

Latish

Anthony Sullivan · Mar 29, 2007

If this works I'll hump your leg.

*Checking*

It did. Where can I find your leg?

Thanks!

Anthony Sullivan

HTTPWebResponse Timeout problems	1	Oct 25, 2005
Long files: HttpWebRequest & StreamRead	2	Jul 25, 2005
HttpWebResponse.GetResponseStream returns incomplete stream	7	Aug 22, 2006
Request.TotalBytes always 0	0	Nov 2, 2006
HTTP Post Question	3	Sep 24, 2006
Help: Trying to Load URL and save to File in Dot.Net	2	Sep 9, 2004
HttpWebRequest.GetResponse() always times out	8	May 18, 2004
500 Internal Server Error	3	Mar 1, 2006

HttpRequest Question

Anthony Sullivan

Ben Rush

Anthony Sullivan

Patrice

Anthony Sullivan

Patrice

Patrice

Anthony Sullivan

Anthony Sullivan

Latish Sehgal

Patrice

Anthony Sullivan

Anthony Sullivan

Latish Sehgal

Anthony Sullivan

Ask a Question

Similar Threads