Convert HTML to XML or Paser HTML

  • Thread starter Thread starter Q.Z.
  • Start date Start date
Q

Q.Z.

Hello,

Does anybody know is there a .NET or COM based library to
parse HTML or convert html to xml so I can use xpath to
parse it?

Thanks
Qin Zhou
 
Hi Q.Z,


Thank you for using Microsoft Newsgroup Service. Based on your description,
you are looking for some COM or dotnet components which can convert the
html document into XML (XHTML) style document. Is my understanding correct?

If so, I think Ken Cox've provided some good sites on this topic, they
shows two components of COM. You may have a try on them to see whether they
help.

Steven Cheng
Microsoft Online Support

Get Secure! www.microsoft.com/security
(This posting is provided "AS IS", with no warranties, and confers no
rights.)
 
If you load you page to WebBrowser control you can parse you page using DOM,
this is work slow, but works.


David Elliott said:
I have tried the SgmlReader but am having difficultly with some sites, such as www.msn.com

If I could find a way to do parsing on HTML using C/C++/C# I would be happy. All I really
need is a way to have an array of <tag> and <data>. Finer grainularity is not necessary. Just
the raw information. I do need the entire page though from opening <html>
to the closing said:
I would prefer an HTML to XML conversion, but as time is limited, any solution would be
appreciated.

Thanks,
Dave
 
Take a look
http://blogs.msdn.com/smourier/archive/2003/06/04/8265.aspx

George.

David Elliott said:
I have tried the SgmlReader but am having difficultly with some sites, such as www.msn.com

If I could find a way to do parsing on HTML using C/C++/C# I would be happy. All I really
need is a way to have an array of <tag> and <data>. Finer grainularity is not necessary. Just
the raw information. I do need the entire page though from opening <html>
to the closing said:
I would prefer an HTML to XML conversion, but as time is limited, any solution would be
appreciated.

Thanks,
Dave
 
Back
Top