WordML or Word DOM

  • Thread starter Thread starter Gadigi
  • Start date Start date
G

Gadigi

Hello,

I have a task where in a word doc has to be converted to HTML. I was
thinking to write XSLT and use wordML. Are any people did this before? Or it
is better to work around with Word DOM.

I cannot use the built in save as HTML web page as there are customizations
that needs to be applied.

TIA.
Suresh
 
Hi Gadigi,
I have a task where in a word doc has to be converted to HTML. I was
thinking to write XSLT and use wordML. Are any people did this before? Or it
is better to work around with Word DOM.
The office.xml might be the better place to follow up on this. I don't know
what you mean by "Word DOM". Word documents can be saved in the binary file
format (that's not generally "readable" unless you have the BIFF), RTF, it's
round-trip HTML, "filtered" HTML (where you'll lose Word-specific info) and as
WordProcessingML (Word's XML). With the exception of the *.doc format, you can
process any of these formats using VB, VBA or any other programming language.
Saving as WordProcessingML does require Word 2003.

The route you describe is certainly possible, and I'm sure it's been done
before. I think there's even some information on the msdn site.
I cannot use the built in save as HTML web page as there are customizations
that needs to be applied.

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Jun 8 2004)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or
reply in the newsgroup and not by e-mail :-)
 
I was referring Word Object model as Word DOM. All the documents to convert
are in Word 2003 format.

I was just wondering which approach is better for converting word 2003 docs
into HTML format. Are there any known issues with WordProcessingML?
 
If you are aiming at (X)HTML I don't think WordProcessingML is a good place
to start. (A brief look at the saved XML file is probably enough to see why
<g>).

Word's own HTML is very unpleasant, but the HTMLTidy tool does an excellent
job of converting it to valid XHTML - if you are happy writing XSLTs to do
any further processing or styling that's probably as good a place as any to
start. (I use HTMLTidy from within the TopStyle HTML and CSS editor, but
it's freely available from http://tidy.sourceforge.net/)

If you need something other than the document content in the HTML (values of
document properties, for instance) then some minor preprocessing using VBA
macros is probably still easier than picking out the bones from the WordML.
 
Back
Top