Rogelio said:
hey, I want to get the entire contents of an HTML page, and put all the html
code it returns in a string. so that I can parse that string for data,
how would I go about doing this? would I need to use a web browser control ?
any help/advise ? thanks.
Well you can read from a file or a URL with various .NET APIs,
WebRequest/HttpWebRequest is useful to read data from a URL. It depends
where you need to access that "HTML page", whether it is on the local
file system, a HTTP server, an FTP server. The main problem to get a
string is to find out the encoding of the HTML document, HTML browsers
go to complicated attempts to identify that by looking at HTTP headers
and at meta elements in the HTML document, looking at HTTP headers is
easy with HttpWebRequest/Response, looking at meta elements is more work.
However there are tools to parse HTML documents, one is SgmlReader
<URL:
http://www.gotdotnet.com/Community/...mpleGuid=B90FDDCE-E60D-43F8-A5C4-C3BD760564BC>
so I wouldn't try to parse a string of HTML with string functions or
regular expressions if that is your aim. With SgmlReader you get an
XmlReader API over the HTML document so the reader recognizes the
different nodes like element nodes, attribute nodes, text nodes, comment
nodes. And you can pass the SgmlReader to other .NET APIs like
XmlDocument or XPathDocument to make use of DOM and/or XPath and XSLT
support in the .NET framework. Much better than string parsing of a HTML
document.