load all HTML into string....

Guest · Sep 9, 2007

hey, I want to get the entire contents of an HTML page, and put all the html
code it returns in a string. so that I can parse that string for data,

how would I go about doing this? would I need to use a web browser control ?
any help/advise ? thanks.

Nicholas Paldino [.NET/C# MVP] · Sep 9, 2007

Rogelio,

You can use the WebClient class in the System.Net namespace, or, if you
need more control over the request, you can use the
HttpWebRequest/HttpWebResponse classes.

Martin Honnen · Sep 9, 2007

Rogelio said:
hey, I want to get the entire contents of an HTML page, and put all the html
code it returns in a string. so that I can parse that string for data,

how would I go about doing this? would I need to use a web browser control ?
any help/advise ? thanks.

Well you can read from a file or a URL with various .NET APIs,
WebRequest/HttpWebRequest is useful to read data from a URL. It depends
where you need to access that "HTML page", whether it is on the local
file system, a HTTP server, an FTP server. The main problem to get a
string is to find out the encoding of the HTML document, HTML browsers
go to complicated attempts to identify that by looking at HTTP headers
and at meta elements in the HTML document, looking at HTTP headers is
easy with HttpWebRequest/Response, looking at meta elements is more work.
However there are tools to parse HTML documents, one is SgmlReader
<URL:http://www.gotdotnet.com/Community/...mpleGuid=B90FDDCE-E60D-43F8-A5C4-C3BD760564BC>
so I wouldn't try to parse a string of HTML with string functions or
regular expressions if that is your aim. With SgmlReader you get an
XmlReader API over the HTML document so the reader recognizes the
different nodes like element nodes, attribute nodes, text nodes, comment
nodes. And you can pass the SgmlReader to other .NET APIs like
XmlDocument or XPathDocument to make use of DOM and/or XPath and XSLT
support in the .NET framework. Much better than string parsing of a HTML
document.

Guest · Sep 9, 2007

Rogelio said:
hey, I want to get the entire contents of an HTML page, and put all the html
code it returns in a string. so that I can parse that string for data,

how would I go about doing this? would I need to use a web browser control ?
any help/advise ? thanks.

(Http)WebRequest Create, wrap the resulting Stream in a StreamReader and
use ReadToEnd.

Arne

reading text from .htm file	3	Dec 17, 2010
What does this mean(web site content)	2	Feb 10, 2012
Recommended approaches to parse HTML from a webclient call	1	Jul 16, 2006
Load HTML in WebBr from string	5	Aug 30, 2006
load HTML into Microsoft Web Browser control	3	Sep 21, 2004
Send email with html body using System.net.mail	2	Nov 6, 2008
Clearing a WebBrowser control	1	Jan 27, 2007
Parsing HTML pages	2	Mar 10, 2006

load all HTML into string....

Guest

Nicholas Paldino [.NET/C# MVP]

Martin Honnen

Guest

Ask a Question

Similar Threads