Retreving Html source code

  • Thread starter Thread starter KelsMckin
  • Start date Start date
K

KelsMckin

What is the easiest way to retreive the html source from a web page?
Could you please provide an example?

Any and all help would be greatly appreciated
 
Hi KelsMcKin,

Assuming you mean how to download a web page by code, the simplest way is
by using a WebClient.

using (WebClient wc = new WebClient())
{
byte[] data = wc.DownloadData("http://www.downloadurl.com");
string html = Encoding.UTF8.GetString(data);
}

This assumes the data is UTF8-encoded.
 
Morten said:
Hi KelsMcKin,

Assuming you mean how to download a web page by code, the simplest way is
by using a WebClient.

using (WebClient wc = new WebClient())
{
byte[] data = wc.DownloadData("http://www.downloadurl.com");
string html = Encoding.UTF8.GetString(data);
}

This assumes the data is UTF8-encoded.


What is the easiest way to retreive the html source from a web page?
Could you please provide an example?

Any and all help would be greatly appreciated

Thankyou so much, that was exactly what I was looking for
 
uh oh, I seem to be getting a cpanel message for every site, do you
have any idea why? I also get it if I use the url http://google.com
the code I get:

<HTML>
<HEAD>
<TITLE>cPanel</TITLE>
<link href="sys_cpanel/css/style.cssx" rel="stylesheet"
type="text/css">
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;
charset=iso-8859-1">
<style>
body { font-family: verdana, arial, helvetica, sans-serif;
font-size: 11px; background-color:#367E8E; scrollbar-base-color:
#005B70; scrollbar-arrow-color: #F3960B; scrollbar-DarkShadow-Color:
#000000; }
a { color:#ffffff; text-decoration:none }
</style>
</HEAD>
<BODY leftmargin="0" topmargin="0" marginwidth="0" marginheight="0">
<table width="100%" height="100%" border="0" cellspacing="0"
cellpadding="0">
<tr valign="top">
<td height="75" nowrap valign="top">
<table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td width="10%"><a href="http://www.cpanel.net"><img
src="sys_cpanel/images/index_01.gif" width="126" height="46"
alt="cPanel" border=0></a></td>
<td width="27%"><img src="sys_cpanel/images/index_02.gif"
width="343" height="46"></td>
<td width="1%"
background="sys_cpanel/images/index_04.gif"><img
src="sys_cpanel/images/index_04.gif" width="43" height="46"></td>
<td width="62%" align="right"
background="sys_cpanel/images/index_04.gif"><img
src="sys_cpanel/images/index_03.gif" width="138" height="46"></td>
</tr>
</table>
</td>
</tr>
<tr>
<td valign="top">
<div style="color:ff9900; font-weight:bold; font-size:24pt;
text-align:center">There is no website configured at this
address.</div><br>
<br>
<div style="color:ffffff">
You are seeing this page because there is nothing configured for the
site you have requested. If you think you are seeing this page in
error, please contact the site administrator or datacenter responsible
for this site.<br>
</div></td></tr>
<tr><td valign="bottom">
<table width=100%>
<tr><td>
<div style="color:ff9900; font-weight:bold">About cPanel:</div><br>
<div style="color:ffffff">cPanel is a leading provider of software for
the webhosting industry. If you would like to learn more about cPanel
please visit our website at <a class=josh
href="http://www.cpanel.net/">http://www.cpanel.net/</a>. Please be
advised that cPanel is not a webhosting company itself, and as such is
not responsible for content found elsewhere on this site.</div>
</tr>
</table>
</td>
</tr>
<tr>
<td height="10">
<table width="100%" border="0" cellspacing="0" cellpadding="0"
background="sys_cpanel/images/bbg.gif">
<tr align="center">
<td background="sys_cpanel/images/bbg.gif"><img
src="sys_cpanel/images/bbg.gif" width="179" height="22"></td>
<td background="sys_cpanel/images/bbg.gif"><img
src="sys_cpanel/images/bottom_label.gif" width="382" height="22"></td>
<td background="sys_cpanel/images/bbg.gif"><img
src="sys_cpanel/images/bbg.gif" width="179" height="22"></td>
</tr>
</table>
</td>
</tr>
</table>
<!--- REVISION: 1.2 --->
</BODY>
</HTML>
 
Well,

Using this code I get

<html><head><meta http-equiv="content-type" content="text/html;
charset=ISO-8859-1"><title>Google</title><style><!--
body,td,a,p,.h{font-family:arial,sans-serif;}
..h{font-size: 20px;}
..q{color:#0000cc;}
-->
</style>
<script>
<!--
function sf(){document.f.q.focus();}
// -->

and so on

I'm not familiar with cPanel site management tool, but I suspect it is
intercepting your network stream at some level.
 
How do you display the code?

The actual data have HTML in lower case.

Try saving the data directly to a file and use notepad to view it.

using (WebClient wc = new WebClient())
{
byte[] data = wc.DownloadData("http://www.google.com");

using(FileStream fs =
System.IO.File.Create(@"C:\Test.html"))
{
fs.Write(data, 0, data.Length);
}
}
 
It would be easier (and more correct) to use the WebClient's
DownloadString method, which uses the encoding specified in the
Encoding property to convert the resource to a string. Alternatively,
if you wish to write the data to a file without bothering with buffers
and FileStream objects, you could use the DownloadFile convenience
method.
 
That may be easier, but in case of a html page encoded in a difference
format than the IIS claims, this would lead to a couple of more lines of
code. In either case you would need to parse the html to detect the
proper encoding, so in the end, downloading it as a most likely correct
encoding to begin with would be smarter.

Note though the DownloadString method is only supported in .Net 2.0, in
1.0 and 1.1 you are stuck with DownloadData or DownloadFile, though there
is always HttpWebRequest ...



It would be easier (and more correct) to use the WebClient's
DownloadString method, which uses the encoding specified in the
Encoding property to convert the resource to a string. Alternatively,
if you wish to write the data to a file without bothering with buffers
and FileStream objects, you could use the DownloadFile convenience
method.

Morten said:
How do you display the code?

The actual data have HTML in lower case.

Try saving the data directly to a file and use notepad to view it.

using (WebClient wc = new WebClient())
{
byte[] data = wc.DownloadData("http://www.google.com");

using(FileStream fs =
System.IO.File.Create(@"C:\Test.html"))
{
fs.Write(data, 0, data.Length);
}
}
 
Back
Top