"source view" of current page in IE with c# help...

T

trint

How can I select view source of the current page in IE with c#?
I got the current page with this:

private void FrmMain_Load(object sender, System.EventArgs e)
{
object loc = "http://www.google.com/";
object null_obj_str = "";
System.Object null_obj = 0;
this.axWebBrowser1.Navigate2(ref loc , ref null_obj, ref null_obj,
ref null_obj_str, ref null_obj_str);
}


private void axWebBrowser1_DocumentComplete(object sender,
AxSHDocVw.DWebBrowserEvents2_DocumentCompleteEvent e)
{
switch(Task)
{
case 1:

HTMLDocument myDoc = new HTMLDocumentClass();
myDoc = (HTMLDocument) axWebBrowser1.Document;

// a quick look at the google html source reveals:
// <INPUT maxLength="256" size="55" name="q">
//
HTMLInputElement otxtSearchBox = (HTMLInputElement) myDoc.all.item
("q", 0);

otxtSearchBox.value = "cargo ship companies";

// google html source for the I'm Feeling Lucky Button:
// <INPUT type=submit value="I'm Feeling Lucky" name=btnI>
//
HTMLInputElement btnSearch = (HTMLInputElement) myDoc.all.item
("btnI", 0);
btnSearch.click();

Task++;
break;

case 2:

// continuation of automated tasks...
break;
}
}

After I can "view source" code, I want to save it as a text file.
Thanks,
Trint
 
G

Gareth Erskine-Jones

How can I select view source of the current page in IE with c#?
I got the current page with this:

Are you saying you want to see the source before it's sent to the
browser? You can do that by overriding Render :


protected override void Render(HtmlTextWriter writer)
{
StringBuilder stringBuilder = new StringBuilder();
StringWriter stringWriter = new StringWriter(stringBuilder);
HtmlTextWriter htmlWriter = new HtmlTextWriter(stringWriter);
base.Render(htmlWriter);

// ***** Save/Modify/Email yourHtml *****

string yourHtml = stringBuilder.ToString();
writer.Write(yourHtml);
}


(this was lifted from http://forums.asp.net/t/180545.aspx)

GEJ
 
T

trint

Are you saying you want to see the source before it's sent to the
browser? You can do that by overriding Render :

 protected override void Render(HtmlTextWriter writer)
        {
        StringBuilder stringBuilder = new StringBuilder();
        StringWriter stringWriter = new StringWriter(stringBuilder);
        HtmlTextWriter htmlWriter = new HtmlTextWriter(stringWriter);
        base.Render(htmlWriter);

         // ***** Save/Modify/Email yourHtml *****

        string yourHtml = stringBuilder.ToString();
        writer.Write(yourHtml);

}

(this was lifted fromhttp://forums.asp.net/t/180545.aspx)

GEJ

I want to save the "source" to a string, then write it to a text
file. I will load the text file next and get the http//addresses out
of the text file like:

first http://www.addressOnThisPage.com
next http://www.addressOnThisPage.com
and so on...
Thanks,
Trint
 
D

David

It sounds like you are attempting to make a spider / webcrawler type of
application.

Do a google for searcharoo (or look on codeproject). This will show you how
to create a spider, but even if that is not what you want, it will show you
how to parse the text for links.

--
Best regards,
Dave Colliver.
http://www.AshfieldFOCUS.com
~~
http://www.FOCUSPortals.com - Local franchises available


Are you saying you want to see the source before it's sent to the
browser? You can do that by overriding Render :

protected override void Render(HtmlTextWriter writer)
{
StringBuilder stringBuilder = new StringBuilder();
StringWriter stringWriter = new StringWriter(stringBuilder);
HtmlTextWriter htmlWriter = new HtmlTextWriter(stringWriter);
base.Render(htmlWriter);

// ***** Save/Modify/Email yourHtml *****

string yourHtml = stringBuilder.ToString();
writer.Write(yourHtml);

}

(this was lifted fromhttp://forums.asp.net/t/180545.aspx)

GEJ

I want to save the "source" to a string, then write it to a text
file. I will load the text file next and get the http//addresses out
of the text file like:

first http://www.addressOnThisPage.com
next http://www.addressOnThisPage.com
and so on...
Thanks,
Trint
 
T

trint

It sounds like you are attempting to make a spider / webcrawler type of
application.

Do a google for searcharoo (or look on codeproject). This will show you how
to create a spider, but even if that is not what you want, it will show you
how to parse the text for links.

--
Best regards,
Dave Colliver.http://www.AshfieldFOCUS.com
~~http://www.FOCUSPortals.com- Local franchises available









I want to save the "source" to a string, then write it to a text
file.  I will load the text file next and get the http//addresses out
of the text file like:

firsthttp://www.addressOnThisPage.com
nexthttp://www.addressOnThisPage.com
and so on...
Thanks,Trint- Hide quoted text -

- Show quoted text -

I simply want to get all of the web addresses in a string that holds
an html page.
example:

<TD VALIGN=center ALIGN=center><A HREF="http://www.apl.com/"
target="new">American President Line</A></TD>
<TD VALIGN=center ALIGN=center><A HREF="http://www.maerskline.com/"
target="new">Maersk-SeaLand</A></TD>
<TD VALIGN=center ALIGN=center><A HREF="http://www.chevron.com/
about/our_businesses/shipping.asp" target="new">Chevron Shipping</A></
TD>

I want to get these into a separate string after I somehow grab them
out of the string containing all the html page:

http://www.apl.com
http://www.maerskline.com
http://www.chevron.com

Thanks,
Trint
 
D

David

In that case, and you obviously haven't even looked, the spider code will
work for you.

You may have found out... many people on here are happy to help YOU find a
solution, but won't actually write the code for you. (I personally think
this is the best way to learn). We will attempt to point you in the right
direction.

A web spider / crawler has the code to do it as it is how they crawl, by
reading the URLs then putting them into an array, then following them.

If you don't want to go that route, then look at Regular Expressions.
--
Best regards,
Dave Colliver.
http://www.AshfieldFOCUS.com
~~
http://www.FOCUSPortals.com - Local franchises available



I simply want to get all of the web addresses in a string that holds
an html page.
example:

<TD VALIGN=center ALIGN=center><A HREF="http://www.apl.com/"
target="new">American President Line</A></TD>
<TD VALIGN=center ALIGN=center><A HREF="http://www.maerskline.com/"
target="new">Maersk-SeaLand</A></TD>
<TD VALIGN=center ALIGN=center><A HREF="http://www.chevron.com/
about/our_businesses/shipping.asp" target="new">Chevron Shipping</A></
TD>

I want to get these into a separate string after I somehow grab them
out of the string containing all the html page:

http://www.apl.com
http://www.maerskline.com
http://www.chevron.com

Thanks,
Trint
 
T

trint

In that case, and you obviously haven't even looked, the spider code will
work for you.

You may have found out... many people on here are happy to help YOU find a
solution, but won't actually write the code for you. (I personally think
this is the best way to learn). We will attempt to point you in the right
direction.

A web spider / crawler has the code to do it as it is how they crawl, by
reading the URLs then putting them into an array, then following them.

If you don't want to go that route, then look at Regular Expressions.
--
Best regards,
Dave Colliver.http://www.AshfieldFOCUS.com
~~http://www.FOCUSPortals.com- Local franchises available

I simply want to get all of the web addresses in a string that holds
an html page.
example:

<TD VALIGN=center ALIGN=center><A HREF="http://www.apl.com/"
target="new">American President Line</A></TD>
   <TD VALIGN=center ALIGN=center><A HREF="http://www.maerskline.com/"
target="new">Maersk-SeaLand</A></TD>
   <TD VALIGN=center ALIGN=center><A HREF="http://www.chevron.com/
about/our_businesses/shipping.asp" target="new">Chevron Shipping</A></
TD>

I want to get these into a separate string after I somehow grab them
out of the string containing all the html page:

http://www.apl.comhttp://www.maerskline.comhttp://www.chevron.com

Thanks,Trint

I'm checking out searcharoo on codeproject now.
Thanks,
Trint
 
T

trint

I'm checking out searcharoo on codeproject now.
Thanks,Trint- Hide quoted text -

- Show quoted text -

Here is what I'm trying to do with my program:

1. Start Google.com:

object loc = "http://
www.google.com/";
object null_obj_str = "";
System.Object null_obj = 0;
this.axWebBrowser1.Navigate2(ref loc , ref null_obj, ref null_obj,
ref null_obj_str, ref null_obj_str);

2. Then start a search for all cargo ship companies:
2a. This works great because it starts a sight that has all the cargo
ship companies that I need on the page it finds and then starts:


HTMLInputElement otxtSearchBox = (HTMLInputElement) myDoc.all.item
("q", 0);

otxtSearchBox.value = "cargo ship companies";
// google html source for the I'm Feeling Lucky Button:
// <INPUT type=submit value="I'm Feeling Lucky" name=btnI>
//
HTMLInputElement btnSearch = (HTMLInputElement) myDoc.all.item
("btnI", 0);
btnSearch.click();

3. I want to take this page that is now in my browser started by my
program and put the source of the page in a string:
3a. This also works great and the string contains all of the "http://
addresses.com" that I want to grab out of the string:

mshtml.HTMLDocumentClass doc = (mshtml.HTMLDocumentClass)
this.axWebBrowser1.Document;
StringBuilder sb = new StringBuilder();
string str1 = "";

sb.Append(doc.documentElement.innerHTML);

str1 = sb.ToString();

4. Now, all I want are the .com addresses that are in this string
(str1) to be parsed out into a string array to look like this:
4a. This is where I am stuck now and don't have:

http://www.address1.com
http://www.address2.com
http://www.address3.com
http://www.address4.com
....etc.

any help is appreciated,
Thanks,
Trint
 
J

Jesse Houwing

Hello trint,
Here is what I'm trying to do with my program:

1. Start Google.com:

object loc = "http://
www.google.com/";
object null_obj_str = "";
System.Object null_obj = 0;
this.axWebBrowser1.Navigate2(ref loc , ref null_obj, ref null_obj,
ref null_obj_str, ref null_obj_str);
2. Then start a search for all cargo ship companies:
2a. This works great because it starts a sight that has all the cargo
ship companies that I need on the page it finds and then starts:
HTMLInputElement otxtSearchBox = (HTMLInputElement) myDoc.all.item
("q", 0);

otxtSearchBox.value = "cargo ship companies";
// google html source for the I'm Feeling Lucky Button:
// <INPUT type=submit value="I'm Feeling Lucky" name=btnI>
//
HTMLInputElement btnSearch = (HTMLInputElement) myDoc.all.item
("btnI", 0);
btnSearch.click();
3. I want to take this page that is now in my browser started by my
program and put the source of the page in a string:
3a. This also works great and the string contains all of the "http://
addresses.com" that I want to grab out of the string:
mshtml.HTMLDocumentClass doc = (mshtml.HTMLDocumentClass)
this.axWebBrowser1.Document;
StringBuilder sb = new StringBuilder();
string str1 = "";
sb.Append(doc.documentElement.innerHTML);

str1 = sb.ToString();

4. Now, all I want are the .com addresses that are in this string
(str1) to be parsed out into a string array to look like this:
4a. This is where I am stuck now and don't have:
http://www.address1.com
http://www.address2.com
http://www.address3.com
http://www.address4.com
...etc.
any help is appreciated,
Thanks,
Trint

I already provided you with a regex that would find all the domains, but
if this is your scenario, then please stop right now, and navigate to this
URL:
http://code.google.com/apis/soapsearch/reference.html

Delete the code you have

And start over...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top