heavy problem with HTMLDocument

P

pierre

Hi, I got a problem which may easy to resolve, but I can't
find any issue:

I want to parse html files, so, I want first get it from an
url, and I do like that:

Dim objMSHTML As New mshtml.HTMLDocument()
Dim objDocument As mshtml.HTMLDocument
objDocument =
objMSHTML.createDocumentFromUrl("http://www.google.fr",
vbNullString)

normally, this should work and I could parse the html
code... but in fact, I got this error:

"Une exception non gérée du type
'System.NullReferenceException' s'est produite dans
mscorlib.dll
Informations supplémentaires : La référence d'objet n'est
pas définie à une instance d'un objet."

(sorry, my vb version is french)

any Idea?
PS: I think this code works with VB6...

une idée?
 
P

Patrick Steele [MVP]

Hi, I got a problem which may easy to resolve, but I can't
find any issue:

I want to parse html files, so, I want first get it from an
url, and I do like that:

Dim objMSHTML As New mshtml.HTMLDocument()
Dim objDocument As mshtml.HTMLDocument
objDocument =
objMSHTML.createDocumentFromUrl("http://www.google.fr",
vbNullString)

Use the built-in .NET networking objects. See:

http://tinyurl.com/98ey
 
P

pierre

Thank you, Patrick
I've just read the article...
but it doesn't seems that it can help me to parse the
html... using mshtml.HTMLDocument, I though I could use the
"links" property which is supposed to give an access to
links in html...

 
C

Cor

Pierre,
I' never seen this methode, so I am curious if it works, but that is not in
one time.

I will advise you to take a look at the "webbrowser" with that you can
"navigate" to an URL
(It uses Internet explorer 6, don't ask me how)

Then with the "documentscomplete" events from the "webbrowser" you can get
the documents conform the dom.

When there is a frame's there is for every frame a document.
There is too a navigate-complete, but with that you get only the last page
downloaded

That's why I find the methode you use strange, but I saw it too in the
documentation

I hope I did bring you in the right direction.
It is to much to give a quick example.

And the webbrowser is only one of the methode's I think you can use, but
that I use for this things at the moment.

I hope it helps you a little bit.
Cor
 
P

Patrick Steele [MVP]

Thank you, Patrick
I've just read the article...
but it doesn't seems that it can help me to parse the
html... using mshtml.HTMLDocument, I though I could use the
"links" property which is supposed to give an access to
links in html...

Sorry -- forgot about your parsing issue.

Perhaps you could get the raw HTML using the .NET WebRequest and then
feed that into the mshtml.HTMLDocument object. I've never used that
object before so I'm not sure if you can load it with your own HTML.
 
C

Charles Law

Hi Pierre

The problem is that although you create a new mshtml.HTMLDocument, it is not
being initialised.

Try the following:

<code>
Dim objMSHTML As New mshtml.HTMLDocument
Dim objDocument As mshtml.IHTMLDocument2
Dim ips As IPersistStreamInit

ips = DirectCast(objMSHTML, IPersistStreamInit)
ips.InitNew()

objDocument = objMSHTML.createDocumentFromUrl("http://www.google.fr",
vbNullString)

Do Until objDocument.readyState = "complete"
Application.DoEvents()
Loop

Debug.WriteLine(objDocument.body.outerHTML)
</code>

At the end of this you can access the DOM. Note that you need to define the
IPersistStreamInit interface.

HTH

Charles


Hi, I got a problem which may easy to resolve, but I can't
find any issue:

I want to parse html files, so, I want first get it from an
url, and I do like that:

Dim objMSHTML As New mshtml.HTMLDocument()
Dim objDocument As mshtml.HTMLDocument
objDocument =
objMSHTML.createDocumentFromUrl("http://www.google.fr",
vbNullString)

normally, this should work and I could parse the html
code... but in fact, I got this error:

"Une exception non gérée du type
'System.NullReferenceException' s'est produite dans
mscorlib.dll
Informations supplémentaires : La référence d'objet n'est
pas définie à une instance d'un objet."

(sorry, my vb version is french)

any Idea?
PS: I think this code works with VB6...

une idée?
 
C

Charles Law

Pierre

In case you don't have it, here is the IPersistStreamInit interface
definition

<code>
Imports System.Runtime.InteropServices

<ComVisible(True), ComImport(),
Guid("7FD52380-4E07-101B-AE2D-08002B2EC713"), _
InterfaceTypeAttribute(ComInterfaceType.InterfaceIsIUnknown)> _
Public Interface IPersistStreamInit
' IPersist interface
Sub GetClassID(ByRef pClassID As Guid)

<PreserveSig()> Function IsDirty() As Integer
<PreserveSig()> Function Load(ByVal pstm As UCOMIStream) As Integer
<PreserveSig()> Function Save(ByVal pstm As UCOMIStream, ByVal
fClearDirty As Boolean) As Integer
<PreserveSig()> Function GetSizeMax(<InAttribute(), Out(),
MarshalAs(UnmanagedType.U8)> ByRef pcbSize As Long) As Integer
<PreserveSig()> Function InitNew() As Integer
End Interface
</code>

HTH

Charles
 
P

pierre

Thanks a lot, it works perfectly :)
P.
Pierre

In case you don't have it, here is the IPersistStreamInit interface
definition

<code>
Imports System.Runtime.InteropServices

<ComVisible(True), ComImport(),
Guid("7FD52380-4E07-101B-AE2D-08002B2EC713"), _
InterfaceTypeAttribute(ComInterfaceType.InterfaceIsIUnknown)> _
Public Interface IPersistStreamInit
' IPersist interface
Sub GetClassID(ByRef pClassID As Guid)

<PreserveSig()> Function IsDirty() As Integer
<PreserveSig()> Function Load(ByVal pstm As UCOMIStream) As Integer
<PreserveSig()> Function Save(ByVal pstm As UCOMIStream, ByVal
fClearDirty As Boolean) As Integer
<PreserveSig()> Function GetSizeMax(<InAttribute(), Out(),
MarshalAs(UnmanagedType.U8)> ByRef pcbSize As Long) As Integer
<PreserveSig()> Function InitNew() As Integer
End Interface
</code>

HTH

Charles


mshtml.HTMLDocument, it is
not need to define
the


.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top