HTML to XML

P

Pietje de kort

Hello everybody,

Html-tidy is the current key for converting HTML into XML,
but I don't know if I am allowed to ship this with a commercial
product (propably not!). So I'm looking for a replacement.

After reading some time on the web I found that the mshtml classes
might be exactly what I'm looking for. The HTMLDocument class for instance
contains references to all frames and forms, so I expect it uses some
sort of XSLT internally.

What I would like to do with mshtml is to convert any HTML document into
XHTML, and parse this XHTML into an System.Xml.XmlDocument. This XmlDocument
I can use for further parsing.

Does anybody know how this can be done using mshtml, msxml, or any other
method using CLR classes? Of course I know that HTML is per definition not
valid XML, that's why the step to XHTML needs to be undertaken first.

Any help would be greatly appreciated,

best regards, Wouter van Vugt
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top