Open Source Word Document Parser

  • Thread starter Thread starter Dennis Myrén
  • Start date Start date
D

Dennis Myrén

Hi.

Are there any open source libraries available for parsing Microsoft Word
Documents?
Especially written in C# able to parse Word2003 documents. I doubt there is.

Are there any specification on the format available from Microsoft?
I have searched. No success.


Thank you. Dennis
 
Dennis said:
Are there any open source libraries available for parsing Microsoft Word
Documents?
Especially written in C# able to parse Word2003 documents. I doubt there is.

this doesn't directly help (as i've looked for something similar for
powerpoint with no success), but my group had an interesting idea:

if you can print a document to a printer, have your program emulate a printer
and then print directly into the program. postscript printers work great for
this.

if anyone knows anything about free word/office parsers, please let the rest
of us know.
 
Dennis,

If there is anything, it is probably for previous versions of word that
Microsoft no longer supports. If you know you are working with the latest
version of word, then you could probably save the documents in XML format
and parse that.

Hope this helps.
 
Back
Top