how to read a website

  • Thread starter Thread starter axis
  • Start date Start date
A

axis

I want to do a very simple thing -- hit a website and retrieve the html it
gives me (in C#). I have to admit I'm a little lost within the MSDN
documentation. The best I can figure is I should use
HttpWebRequest.Create(URL) to being setting up the connection, but I don't
know where to go from there. Are there any online docs that show sample code
to retrieve the html that's returned? Or, what's the sequence of calls to
finally get a string representation of the page?

Thanks in advance
 
Hi Axis,

It requires a few steps. The following code is the basic way to do it, not using any headers

HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create("http://www.google.com");

HttpWebResponse resp = (HttpWebResponse)req.GetResponse();

StreamReader sr = new StreamReader(resp.GetResponseStream());

// get the text (StreamReader uses UTF-8 by default, but you can change it)
string html = sr.ReadToEnd();

//close the streams
sr.Close();
resp.Close();
 
Is there a way to do what you've done (navigate to a specific webpage) and
then take a "screenshot" programatically? For example, going to the main
CSI (ya, tv show :>) website with numerous pictures, formatting, etc, and
take a screenshot or paint it to a panel/canvas/window and then save that to
a jpeg?

Thanks.
 
Is there a way to do what you've done (navigate to a specific webpage) and
then take a "screenshot" programatically? For example, going to the main
CSI (ya, tv show :>) website with numerous pictures, formatting, etc, and
take a screenshot or paint it to a panel/canvas/window and then save that to
a jpeg?

No, because this method will retrieve the source code of the page only, images and stuff are linked and rendered before displayed in a browser.

You may be able to use the ActiveX Web Browser object to open the page and take a screenshot of it, not sure.
 
I may be wrong, but I'm thinking you can't do that - at least not without an
interface. To do a screenshot, you need to take a picture of what is
actually being rendered to the screen. If it's not visible, you can't
capture it. If you don't mind a window popping up, you could open IE, size
the window accordingly (or put into kiosk mode), then take a screenshot, is
that what you want? Does the jpeg need to be a certain size? Can it be the
whole desktop area?
 
Howdy. That was quick! :>
I may be wrong, but I'm thinking you can't do that - at least not without an
interface. To do a screenshot, you need to take a picture of what is
Is there sometype of object that does that? Sorry for the newbie type
questions, I'm coming from java and am trying to learn c# and .net at the
sametime. I thought making a dynamic screensaver would be cool.
the window accordingly (or put into kiosk mode), then take a screenshot, is
that what you want? Does the jpeg need to be a certain size? Can it be the
whole desktop area?
Right now, I am just looking to see a rough idea if it can be done. Any
size is good! :> haha

Thanks.
 
Morten said:
Hi Axis,

It requires a few steps. The following code is the basic way to do it,
not using any headers

HttpWebRequest req =
(HttpWebRequest)HttpWebRequest.Create("http://www.google.com");

HttpWebResponse resp = (HttpWebResponse)req.GetResponse();

StreamReader sr = new StreamReader(resp.GetResponseStream());

// get the text (StreamReader uses UTF-8 by default, but you can change it)
string html = sr.ReadToEnd();

//close the streams
sr.Close();
resp.Close();

Thanks! That's precisely the code I needed.

Now, normally if I were in C or C++ I'd go into building my custom HTML
parser -- I'm trying to extract data from an html page with predictable
design (i.e. content I want is always in particular locations in the
html heirarchy). Anyway, .Net is awesome in the sheer number of utility
classes already built, so the followup question -- is there a utility
class where I can feed in the HTML and it'll allow me to browse it
programatically? Similarly to the XMLDocument class for XML? I know
technically I could feed a well formed HTML page as XML, but I can
guarantee this page isn't well formed. Otherwise, I'll have lots of fun
writing regexps and substring ops.

Thanks again!
- Axis
 
Back
Top