Save text from webpage -- how hard can it be???

M

Mark Davies

I'm trying to do something very simple -- capture the text from a web page.
Can't get it to work. Have scoured the web for info -- no luck
(even/especially MSDN).

Here's what I've tried:

-------------

Added COM reference [Microsoft Internet Controls]

-------------

Public WithEvents wb As SHDocVw.WebBrowser
wb = New SHDocVw.WebBrowser
wb.Visible = False
wb.Navigate("http://www.sitex.com") ' <<<< PROGRAM DIES HERE <<<<<

dim txtFromWebPage as string
txtFromWebPage = wb.Document.Body.innerText

-------------

I get the following at the [wb.Navigate] line (marked above):

An unhandled exception of type 'System.Runtime.InteropServices.COMException'
occurred in Show HTML.exe
Additional information: Unspecified error

Any suggestions???

P.S.:

If I use SHDocVw.InternetExplorer instead of SHDocVw.WebBrowser, it won't
crash at this point, but then I can't figure out how to get the text from
the webpage (objIE.Document.Body.innerText doesn't work).
 
K

Kael V. Dowdy

Here is some code I use to do something similar...however, it is for a
Classic ASP application, and not a .NET WinForms or WebForms app.
Maybe it'll give you some ideas???

HTH.

Kael MCSD

----------------------
<%
Response.Buffer = True
Dim oXMLHTTP

' Create an xmlhttp object:
Set oXMLHTTP = Server.CreateObject("Microsoft.XMLHTTP")
' Or, for version 3.0 of XMLHTTP, use:
'Set xml = Server.CreateObject("MSXML2.ServerXMLHTTP")

' Opens the connection to the remote server.
oXMLHTTP.Open "GET", "http://www.the-url-of-page-you-wanna-get.com",
False

' Actually Sends the request and returns the data:
oXMLHTTP.Send

'Display the HTML both as HTML and as text
Response.Write "<h1>The HTML text</h1><xmp>"
Response.Write oXMLHTTP.responseText
Response.Write "</xmp><p><hr><p><h1>The HTML Output</h1>"

Set oXMLHTTP = Nothing
%>
 
M

Matora Nikolay

Hi. Try this code.
You have to use [MTAThread]
You will also have to implement IPersistMoniker interface (see MSDN), or,
maybe, find "olelib.dll" which implements one.

[DllImport("UrlMon.dll", CharSet=CharSet.Auto)]
private static extern void CreateURLMoniker(UCOMIMoniker pMkCtx, [In,
MarshalAs(UnmanagedType.LPWStr)] string szURL, out UCOMIMoniker ppmk);

[DllImport("ole32.dll", CharSet=CharSet.Auto)]
private static extern void CreateBindCtx(uint reserved, out UCOMIBindCtx
ppbc);

// Code:
mshtml.IHTMLDocument2 htmlDoc = null;

UCOMIBindCtx ctx1;
UCOMIMoniker moniker1;
olelib.IPersistMoniker pf;

htmlDoc = new mshtml.HTMLDocumentClass();
pf = htmlDoc as olelib.IPersistMoniker;
CreateBindCtx(0, out ctx1);
CreateURLMoniker(null, url, out moniker1);
pf.Load(1, moniker1 as olelib.IMoniker, ctx1 as olelib.IBindCtx, 0);

curState = htmlDoc.readyState;
secondsLeft = 50; // 10 sec
while ((secondsLeft > 0)&&(curState != "complete")){
Thread.Sleep(200); secondsLeft--;
curState = htmlDoc.readyState;
}

curState != "complete" -> fail


Kael V. Dowdy said:
Here is some code I use to do something similar...however, it is for a
Classic ASP application, and not a .NET WinForms or WebForms app.
Maybe it'll give you some ideas???

HTH.

Kael MCSD

----------------------
<%
Response.Buffer = True
Dim oXMLHTTP

' Create an xmlhttp object:
Set oXMLHTTP = Server.CreateObject("Microsoft.XMLHTTP")
' Or, for version 3.0 of XMLHTTP, use:
'Set xml = Server.CreateObject("MSXML2.ServerXMLHTTP")

' Opens the connection to the remote server.
oXMLHTTP.Open "GET", "http://www.the-url-of-page-you-wanna-get.com",
False

' Actually Sends the request and returns the data:
oXMLHTTP.Send

'Display the HTML both as HTML and as text
Response.Write "<h1>The HTML text</h1><xmp>"
Response.Write oXMLHTTP.responseText
Response.Write "</xmp><p><hr><p><h1>The HTML Output</h1>"

Set oXMLHTTP = Nothing
%>
----------------------

"Mark Davies" <[email protected]> wrote in message
I'm trying to do something very simple -- capture the text from a web page.
Can't get it to work. Have scoured the web for info -- no luck
(even/especially MSDN).

Here's what I've tried:

-------------

Added COM reference [Microsoft Internet Controls]

-------------

Public WithEvents wb As SHDocVw.WebBrowser
wb = New SHDocVw.WebBrowser
wb.Visible = False
wb.Navigate("http://www.sitex.com") ' <<<< PROGRAM DIES HERE <<<<<

dim txtFromWebPage as string
txtFromWebPage = wb.Document.Body.innerText

-------------

I get the following at the [wb.Navigate] line (marked above):

An unhandled exception of type 'System.Runtime.InteropServices.COMException'
occurred in Show HTML.exe
Additional information: Unspecified error

Any suggestions???

P.S.:

If I use SHDocVw.InternetExplorer instead of SHDocVw.WebBrowser, it won't
crash at this point, but then I can't figure out how to get the text from
the webpage (objIE.Document.Body.innerText doesn't work).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top