load all HTML into string....

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

hey, I want to get the entire contents of an HTML page, and put all the html
code it returns in a string. so that I can parse that string for data,

how would I go about doing this? would I need to use a web browser control ?
any help/advise ? thanks.
 
Rogelio,

You can use the WebClient class in the System.Net namespace, or, if you
need more control over the request, you can use the
HttpWebRequest/HttpWebResponse classes.
 
Rogelio said:
hey, I want to get the entire contents of an HTML page, and put all the html
code it returns in a string. so that I can parse that string for data,

how would I go about doing this? would I need to use a web browser control ?
any help/advise ? thanks.

Well you can read from a file or a URL with various .NET APIs,
WebRequest/HttpWebRequest is useful to read data from a URL. It depends
where you need to access that "HTML page", whether it is on the local
file system, a HTTP server, an FTP server. The main problem to get a
string is to find out the encoding of the HTML document, HTML browsers
go to complicated attempts to identify that by looking at HTTP headers
and at meta elements in the HTML document, looking at HTTP headers is
easy with HttpWebRequest/Response, looking at meta elements is more work.
However there are tools to parse HTML documents, one is SgmlReader
<URL:http://www.gotdotnet.com/Community/...mpleGuid=B90FDDCE-E60D-43F8-A5C4-C3BD760564BC>
so I wouldn't try to parse a string of HTML with string functions or
regular expressions if that is your aim. With SgmlReader you get an
XmlReader API over the HTML document so the reader recognizes the
different nodes like element nodes, attribute nodes, text nodes, comment
nodes. And you can pass the SgmlReader to other .NET APIs like
XmlDocument or XPathDocument to make use of DOM and/or XPath and XSLT
support in the .NET framework. Much better than string parsing of a HTML
document.
 
Rogelio said:
hey, I want to get the entire contents of an HTML page, and put all the html
code it returns in a string. so that I can parse that string for data,

how would I go about doing this? would I need to use a web browser control ?
any help/advise ? thanks.

(Http)WebRequest Create, wrap the resulting Stream in a StreamReader and
use ReadToEnd.

Arne
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top