Extracting html source from a web page...

K

Konrad Rotuski

if you want to see HTML source code 'manually' i'd recommend attaching to IE process using VS.NET

as for getting HTML source via code i think you should look at the IObjectWithSite interface and related ones .. have a look at : http://weblogs.asp.net/stevencohn/articles/60948.aspx for more information

HTH

Konrad
I am trying to get at the source of a web page. Looking at the innerHTML element is only part of the story. In IE, right-clicking on various different parts of the page gives me different results when I click on view_source.

The source I need is contained inside IFRAME tags (which contain references to jsp pages)... The html content isn't available when I look at the innerHTML of the document returned in the DocumentComplete event of the WebBrowser control. My question is basically, how do I get the html generated by the jsp page in the IFRAME? Better yet, how do I get the complete html as it is rendered by IE?

A snippit of VB.Net code would be much appreciated, if possible.

Many thanks in advance

-=NaJ=-
 
J

Joerg Jooss

I am trying to get at the source of a web page. Looking at the
innerHTML element is only part of the story. In IE, right-clicking
on various different parts of the page gives me different results
when I click on view_source.

Because you're looking at multiple sources...
The source I need is contained inside IFRAME tags (which contain
references to jsp pages)... The html content isn't available when I
look at the innerHTML of the document returned in the
DocumentComplete event of the WebBrowser control. My question is
basically, how do I get the html generated by the jsp page in the
IFRAME?

Simply download the contents referenced by the IFRAME's SRC attribute using
Systetm.Net.WebClient or System.Net.WebRequest.
Better yet, how do I get the complete html as it is rendered
by IE?

There's no such thing. You're basically looking at two distinct HTML
documents at the same time.

Cheers,
 
R

Raed Sawalha

I did that with one of my pages ,like this

Code:
//We check the extension of file if it is HTML or NOT

//lets say we have this string containg the file name 

string html = "http://localhost/Project/test.htm";

if(html.EndsWith(".htm") || html.EndsWith(".html"))

{ 

//Remove white spaces

html = html.Trim();

//Construct string builder object 

StringBuilder sBuilder = new StringBuilder(); 

string temp="";

try

{

//Request 

System.Net.HttpWebRequest webrequest = (HttpWebRequest)System.Net.WebRequest.Create(html);

//Get 

System.Net.HttpWebResponse webresponse=(HttpWebResponse)webrequest.GetResponse();

//Read the content of HTML file

StreamReader webstream = new StreamReader(webresponse.GetResponseStream(),Encoding.Default);

//Loop until End-Of-File

while((temp=webstream.ReadLine())!= null)

{

sBuilder.Append(temp + "\n\r");

}

//Save the content in temporary variable

string HtmlContent = sBuilder.ToString();



hope that what u need?



Regards



I am trying to get at the source of a web page.  Looking at the innerHTML element is only part of the story.  In IE, right-clicking on various different parts of the page gives me different results when I click on view_source.  

The source I need is contained inside IFRAME tags (which contain references to jsp pages)...  The html content isn't available when I look at the innerHTML of the document returned in the DocumentComplete event of the WebBrowser control.  My question is basically, how do I get the html generated by the jsp page in the IFRAME?  Better yet, how do I get the complete html as it is rendered by IE?

A snippit of VB.Net code would be much appreciated, if possible.

Many thanks in advance

-=NaJ=-
 
N

news

I am trying to get at the source of a web page. Looking at the innerHTML element is only part of the story. In IE, right-clicking on various different parts of the page gives me different results when I click on view_source.

The source I need is contained inside IFRAME tags (which contain references to jsp pages)... The html content isn't available when I look at the innerHTML of the document returned in the DocumentComplete event of the WebBrowser control. My question is basically, how do I get the html generated by the jsp page in the IFRAME? Better yet, how do I get the complete html as it is rendered by IE?

A snippit of VB.Net code would be much appreciated, if possible.

Many thanks in advance

-=NaJ=-
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top