MSHTML and MSXML in VB6

L

Lucky

hi guys,
i need to parse html data that i've got from "Inet" object in vb6.
now i want to prase the html data. here i got 2 options. one is MSXML
and other is MSHTML. i tried both of them but i didnt get anything out
of them. MSXML doesnt works with some keywords and consider some
letters as operator so i cant go with MSXML. i tried MSHTML but it
doesnt provide any way to parse the HTML data you got from other
source. there is a method(fectory method) in MSHTML which can be use to
get HTMLDocument. but it has a problem. if the page you are trying to
get through code, has script to set the focus to the control and the
MSHTML object throws an error. here the problem is error comes from the
MSHTML and which can not be hadled. as it doesnt receive data after
throwing the error.
if anyone has any idea or suggestion on how can i solve this
problem, plese do share with me. this VB6 really sucks.


Thanks,
Lucky
 
H

Herfried K. Wagner [MVP]

Lucky said:
i need to parse html data that i've got from "Inet" object in vb6.

This is a VB.NET group. I suggest to post the question to one of the groups
in the "microsoft.public.vb.*" hierarchy.
 
N

Nicholas Paldino [.NET/C# MVP]

Lucky,

First, the reason that MSXML doesn't work is because HTML is not XML, so
you will almost always get a parse error.

As for MSHTML, it really isn't directly exposable from VB, it's more of
a COM interface (as opposed to Automation, which is what VB demands).

To load a page into MSHTML, you would have to somehow provide access to
the IMoniker interface that is returned from a call to the API CreateMoniker
(passing the URL of the source you want to download). Once you do this, you
can pass it to the IPersistMoniker::Load implementation on MSHTML. This
will cause MSHTML to trigger the load and parse the document.

However, doing any of this from VB is *extremely* difficult, and you
should probably use C++ to access these interfaces and perform this work,
exposing it to VB6 in the manner you desire.

Hope this helps.
 
C

Cor Ligthert [MVP]

As for MSHTML, it really isn't directly exposable from VB, it's more of
a COM interface (as opposed to Automation, which is what VB demands).

To load a page into MSHTML, you would have to somehow provide access to
the IMoniker interface that is returned from a call to the API
CreateMoniker (passing the URL of the source you want to download). Once
you do this, you can pass it to the IPersistMoniker::Load implementation
on MSHTML. This will cause MSHTML to trigger the load and parse the
document.

However, doing any of this from VB is *extremely* difficult, and you
should probably use C++ to access these interfaces and perform this work,
exposing it to VB6 in the manner you desire.

Using MSHTML in VB 2002/2005 is *extremely* simple.

http://www.vb-tips.com/default.aspx?ID=541adf13-d9c0-435c-893f-56dbb63fdf1c

The same of course for C#

However this is of course no VB6

:)

Cor
 
N

Nicholas Paldino [.NET/C# MVP]

Cor,

Yes, and unfortunately, the OP was looking for a VB6 solution.

Also, the code in the link that you sent is incorrect. Using the write
method of the document is not the correct way to feed content into MSHTML
(and it is a common misconception, you have no control over the headers that
are sent back which help with the processing of the document, and it all has
to be inferred).

The correct way to feed source to MSHTML is to create an implementation
of IMoniker, and pass that through to IPersistMoniker.Load. Then, you can
stream your content from whatever source you like. Additionally, you can
mimic the source as if it was downloaded from a site, or obtained from some
other resource.
 
C

Cor Ligthert [MVP]

Nicholas,
Also, the code in the link that you sent is incorrect

Did you try it? (If not than please do it before you write things like this
next time).

When I created it, I have tested it before placing it on the website.

About the headers is that AFAIK headers are not implemented in the DOM,
while MSHTML represents the DOM (Document Object Model).

However the part you are menioned is only to get an MSHTML document.

I was today actual busy with MSHTML and when you are used to it, than it is
extremely easy in Net. (It is in a normal Net reference by the way where
there was an error in the documentation part. It is not System.MSHTML but
Microsoft.MSHTML. I have changed that now).

However feel free to do it your way. I keep it with this "exremely" easy way
in Net.

:))

Cor
 
N

Nicholas Paldino [.NET/C# MVP]

Cor,

No, I did not try it, but I do not have to, because I know the
architecture of MSHTML very well (forgive me for saying so).

Yes, the headers are not implemented in the DOM, BUT, how the DOM
interprets the stream of information that is sent to it is in part dictated
by the headers.

I'll give you an example of what doesn't work with this method.

Say for example that the document you have doesn't have absolute URLs,
but relative ones. When you create a new MSHTML document as in the example
and write the content using doc.write, it assumes a base url of
"about:blank". It doesn't know how to interpret the relative URLs, and it
will reflect that when you try an access say, the SRC property on the object
representation of an anchor (A) element.

However, if you use the IMoniker implementation, and feed the content
through that, while having the implementation of IMoniker::GetDisplayName
return the URL of the content itself, your URLs in the object model will be
absolute, not relative. The SRC property on the A element will return the
absolute URL, resolved with the base url (returned from GetDisplayName), and
not a relative one.
 
C

Cor Ligthert [MVP]

Nicholas,

Feel free not to use it, however as a short not investigated answer.

In my idea is a relative URL always related to the Host Url in the DOM of
the document or in the parent document when frames are used.

However as I said, it does not bother you if you have another opinion as me.

I see it working.

Cor
 
H

Herfried K. Wagner [MVP]

Cor Ligthert said:
Feel free not to use it, however as a short not investigated answer.

In my idea is a relative URL always related to the Host Url in the DOM of
the document or in the parent document when frames are used.

However as I said, it does not bother you if you have another opinion as
me.

I see it working.

Well, I believe it simply depends on what you want to archieve.
 
N

Nicholas Paldino [.NET/C# MVP]

Cor,

Don't take offense, as that was not my intent. The link that you
pointed to will work for a good number of situations, but it won't work for
all of them, and I was trying to point out those situations where that is
the case.
 
L

Lucky

hi guys,
thanks for your contribution for knowledge sharing. it was very
informative and wonderful. i want to inform "Herfried" that i'm also a
..Net Developer and regular visitor of this group as my core experties
in vb.net. moreover currently i'm screwed up with VB6 and i thought
only VB group can help me out and you must have seen the drops of the
knowlede sharing occured on my Query.
anyways thanks guys.
 
C

Cor Ligthert [MVP]

Nicholas,

I never felt it as offense, I did not want that people had the idea that Ken
and I were placing untested samples on our website (which even can than have
errors by the way because of last minute changes in the code).

Therefore I pointed primary on that sentence from you where you told that
the code was incorrect. You did not write "in my standards" incorrect.

Luckely (as I think only at security) with this kind of operations there are
a lot of situations that it will not work.

However the samples are only to show that basicly MSHTML is very simple if
you know the DOM. Without that it is extremely difficult to use (All the
different interfaces makes it as well very difficult).

The construction to get that page is really not a simple way if you cannot
copy the code from somewhere. What I did in parts from all over the
Internet. The only thing I did in that first part of the sample was
assembling it to what was needed and delete those parts which were not
needed.

:)

Cor
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top