HttpRequest Question

A

Anthony Sullivan

I'm working on a quick and dirty little app that will go out to a website
and scrape an xml response.

So far its working fine with one exception. The xml response has an xsl
stylesheet tag at the top so that when a browser hits it the xml is
transformed. When I use the code below I get the transformed version of the
page rather than the raw xml. Can anyone tell me how to get the raw xml
response that I see when I 'view source' in IE?

Here is the code:

HttpWebRequest oRequest = (HttpWebRequest)
WebRequest.Create("Http://www.randomwebsite.com/somexmlfile.xml");
HttpWebResponse oResponse = (HttpWebResponse)
oRequest.GetResponse();
Stream oStream = oResponse.GetResponseStream();
StreamReader oReader = new StreamReader(oStream);
Response.Write(oReader.ReadToEnd());
oResponse.Close();

Thanks!

Anthony
 
B

Ben Rush

what happens if you use System.Net.WebClient instead? I'm not sure if this
will fix your problem or not....

System.Net.WebClient web = new System.Net.WebClient();

String data =web.DownloadString(pageURL);
 
A

Anthony Sullivan

Same thing unfortunately. :(

Any other ideas?

Ben Rush said:
what happens if you use System.Net.WebClient instead? I'm not sure if this
will fix your problem or not....

System.Net.WebClient web = new System.Net.WebClient();

String data =web.DownloadString(pageURL);
 
P

Patrice

Do yo meant you used "view source" and you see transformed content ? Else
you'll have to suppress the stylesheet line. For now IMO you should get the
XML document but as you have still the stylesheet reference embedded in it,
the browser to which you stream the result will still use the stylesheet.

Else I'll give this a try but I doubt WebRequest would automatically handle
this. Another option could be that the server itself does something based
for example on the user agent to render already transformed data if it looks
like the user agent doesn't support stylesheets...
 
A

Anthony Sullivan

When I browse in IE to the site and hit 'View Source' I see the raw un
transformed xml.

When the browser processes the file it downloads the xsl and transforms it
for viewing.

What I'd like to do is grab the raw xml source. Its obviously available
somewhere of I wouldn't see it when I 'view source'.

Any ideas?
 
P

Patrice

Just to confirm the behavior I assumed. That is when reading an XML document
such as http://msdn.microsoft.com/globalrss/en-us/global-msdn-en-us.xml, I
still get the unchanged XML document.

So :
- either you mistakenly thought the document was transformed because you saw
it in a browser(in which case it shouldn't be a problem)
- either the server could render an already transformed document if the user
agent is thought to be unaware of XML/XSL (in which case you could try to
provide the missing user agent information).
 
P

Patrice

See my confirmation at this same thread level.

For now it looks like to me you make a confusion between what you see in the
browser (that is the result of the XSL transformation) and what you really
have (that is the original XML file). Obviously if you stream the response
you get from the server to a browser, the browser will still process the
file and display it as if it were transformed. If your program process the
result, it will see though this content as the raw XML document and not
something transformed.

What do you want to do on this XML file ? If you want to process this in
your code, you IMO already have the "raw" XML file. If you want to just
display it as an untransformed file in the browser you could just suppress
the stylesheet tag.
 
A

Anthony Sullivan

I'm sorry,

I'm probably not being completely clear.

When I hit view source I see XML. However, If use the code I posted in my
original post and I save the response stream to a file it isn't the XML but
rather the transformed HTML. Even if I open it in notpade which removes the
browser element from the equation.

I wish I knwe a way to be more transparent but I don't.

Anthony
 
L

Latish Sehgal

Anthony
Could you give a sample page where you are able to see the xml source?
I tried the code with the url Patrice cited above, and it seems to
work.

string URL = "http://msdn.microsoft.com/globalrss/en-us/global-
msdn-en-us.xml";
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(URL);


//Get the data as an HttpWebResponse object
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();

//Convert the data into a string (assumes that you are
requesting text)
StreamReader sr = new StreamReader(resp.GetResponseStream());

String results = sr.ReadToEnd();
sr.Close();
 
P

Patrice

IMO it left us with the second option I talked about in a previous post as
you are positively sure you get a transformed document from the server. That
is if the server receives a request without a user agent string it could
then perform the transformation server side to support non XML/XSL browsers.

Have you tried my suggestion to add a valid user agent string to your
request ?
 
A

Anthony Sullivan

Hi Latish,

The code that Patrice provided seems to be doing the same thing. I've
uploaded a test page for you guys to look at. Here is the link. also
included below is the code for that page. It's not terribly complex. What I
believe to be happening is that the HttpWebRequest Object is parsing the XML
much like a browser would. I don't believe that that translated result is
what the server is delivering.

If you browse out to the link that the code is requesting and "view source"
you'll see the raw xml, however if you look at what my code is outputting
you can see that you get the transformed document instead.

Here is the link to my page.
http://armory.bobguild.net/default.aspx

Here is the link that the page is trying to request.
http://armory.worldofwarcraft.com/guild-info.xml?r=Duskwood&n=Band+òf+Brothers&p=1

Here is the codebehind my page.
string URL =
"http://armory.worldofwarcraft.com/guild-info.xml?r=Duskwood&n=Band+òf+Brothers&p=1";
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(URL);

//Get the data as an HttpWebResponse object
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();

//Convert the data into a string (assumes that you are requesting
text)
StreamReader sr = new StreamReader(resp.GetResponseStream());

String results = sr.ReadToEnd();
sr.Close();

Response.Write(results);

Thanks!

Anthony Sullivan
 
L

Latish Sehgal

Patrice had the correct solution.
Add the following line

HttpWebRequest req = (HttpWebRequest)WebRequest.Create(URL);
req.UserAgent = @"Mozilla/4.0 (compatible; MSIE 6.0; Windows
NT 5.1)";

Latish
 
A

Anthony Sullivan

If this works I'll hump your leg. ;)

*Checking*

It did. Where can I find your leg?

Thanks!

Anthony Sullivan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top