Webpage to text -- how hard can it be???

M

Mark Davies

I'm trying to do something very simple -- capture the text from a web page.
Can't get it to work. Have scoured the web for info -- no luck
(even/especially MSDN).

Here's what I've tried:

-------------

Added COM reference [Microsoft Internet Controls]

-------------

Public WithEvents wb As SHDocVw.WebBrowser
wb = New SHDocVw.WebBrowser
wb.Visible = False
wb.Navigate("http://www.sitex.com") ' <<<< PROGRAM DIES HERE <<<<<


dim txtFromWebPage as string
txtFromWebPage = wb.Document.Body.innerText

-------------

I get the following at the [wb.Navigate] line (marked above):

An unhandled exception of type 'System.Runtime.InteropServices.COMException'
occurred in Show HTML.exe
Additional information: Unspecified error

Any suggestions???

P.S.:

If I use SHDocVw.InternetExplorer instead of SHDocVw.WebBrowser, it won't
crash at this point, but then I can't figure out how to get the text from
the webpage (objIE.Document.Body.innerText doesn't work).
 
G

Guest

Gidday

Unless you specifically need the COM control I'd bury it. With all due
respect to Microsoft i think that control is a piece of shit. The same
could be said for many things but why they didn't replace this with
managed code in v1.1, I'll never understand. It is ".NET" after all.
Roll on "Whidbey"!

Dim html as String
Dim request as WebRequest
Dim response as WebResponse
Dim urlStreamReader As StreamReader

try
request = WebRequest.Create(New Uri("http://www.sitex.com"))
response = request.GetResponse()
urlStreamReader = response.GetResponseStream()
html = urlStreamReader.ReadToEnd
Finally
if not urlStreamReader Is Nothing then urlStreamReader.Close
End Try

This will get you all your html. Then you can do what you like with
it.

hth
Richard
 
C

Cor

Hi Mark,

Use the axWebbrowser for this, does a lot more than the newer Webbrowser.

And that in combination with MSHTML which is in the normal referencebox.
MSHTML is a kind of Document Object Model. Do not set an import for it, it
has to many interfaces and your Ide becomes terrible slow. Reference it with
a full path where you need it.

I hope this helps?

cor
 
H

Herfried K. Wagner [MVP]

* "Mark Davies said:
I'm trying to do something very simple -- capture the text from a web page.
Can't get it to work. Have scoured the web for info -- no luck
(even/especially MSDN).

Why don't you use 'WebClient.DownloadFile'?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top