Retrieving code of a particular URL

X

Xarky

Hi,
I am trying to retrieve the source code of a particular url, but it
seems that I am doing something wrong.

HttpWebRequest webRequest;
HttpWebResponse webResponse;
StreamReader reader;
string text="";
int pos=-1;

webRequest = (HttpWebRequest) WebRequest.Create(data);
webResponse = (HttpWebResponse) webRequest.GetResponse();
reader = new StreamReader(webResponse.GetResponseStream(),
Encoding.ASCII);
text = reader.ReadToEnd();
pos = text.IndexOf(" out of 5 stars", 0);

The variable data containds a URL address such as the following-
http://www.amazon.com/exec/obidos/A...50-0321606?dev-t=sdas&camp=2025&link_code=sp1

When I enter in the link manually, enter in View Source and make a
search for " out of 5 stars", the data is found, but when I do it
within the code, that string is not found.

Can someone give me some help.


Thanks in Advance
 
M

Morten Wennevik

Hi Xarky,

I doubt the web page is ASCII encoded. Most likely it is UTF-8 encoded,
which is the default encoding for the StreamReader class or standard 8-bit
(ISO-8859-1). You can detect the encoding using the
HttpWebResponse.ContentEncoding or by reading the CHARSET in the source.
See the thread "Is this an encoding problem". However, checking the page
I found no encoding markers whatsoever.

Btw, using UTF-8 I found "out of 5 stars" at position 19421.
I should go nag Amazon about their horribly non standard html code.
 
X

xarky

I changed this line with Encoding.UTF8, but it is still not finding me
the data told.

reader = new StreamReader(webResponse.GetResponseStream(),
Encoding.UTF8);
 
M

Morten Wennevik

Hi xarky,

Upon further examination it appears the encoding isn't to blame after all.
In fact there is no " out of 5 stars" in the source at all.
There is a linebreak before "out", so " \nout of 5 stars" will give the
correct value.

If there is always a linebreak at that position you will be fine,
otherwise you might want to strip away all linebreaks before searching.
 
X

xarky

Hi,

is there another way of downloading html code from that shown below,
because its not working correctly, where it must be working.

HttpWebRequest webRequest = (HttpWebRequest) WebRequest.Create(data);
HttpWebResponse webResponse = (HttpWebResponse)
webRequest.GetResponse();
StreamReader reader = new StreamReader(webResponse.GetResponseStream(),
Encoding.UTF8);
text = reader.ReadToEnd();
 
M

Morten Wennevik

Xarky,

Your method should work, but unlike searching in internet explorer you
will need to beware of linebreaks when using IndexOf. The downloaded text
is exactly the same as Internet Explorer sees.

Other methods of downloading can be using
WebClient.DownloadFile/DownloadData or reading bytes from the Stream you
get from HttpWebResponse.GetResponseStream(). The downloaded text would
be the same as your method.

It may be that a Regex pattern will allow you to search for " out of 5
stars" and allow for linebreaks inside the pattern.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top