Extracting text from a "word document"-stream

  • Thread starter Thread starter Claus - Arcolutions
  • Start date Start date
C

Claus - Arcolutions

I got a word document as a stream, and I want to get the text from the word
document. But I cant seem to find anything to use for that purpose.

The "Microsoft office ?.object" com reference, only include functionality to
read from a file (as far as I know).

I looked a little on the structure of the document, but it doesnt seem to
have any common structures, especially not if you compare from different
office versions.

Anyone who know anything that could help me in my search?
 
Claus,

Why not save the contents of the stream to disk, and then read the
contents from that?

Also, I am pretty sure that the Document class in word implements the
IPersistStream interface (I can't imagine that it doesn't). However, this
is a COM interface, and it doesn't work with .NET streams, rather, it works
with the IStream interface in COM. All in all, you are better off saving
the contents of a stream to a file on disk, and then working from that.

Hope this helps.
 
Hi Claus,

Do you have any control over the format (and version of word that creates)
of the word document. If you do, then you might consider using the XML
format supported by Office 2003 Professional version of Word
(WordProcessingML is the format definition). You can look here for more
information on WordProcessingML.

http://msdn.microsoft.com/library/d...LCDK/html/cdkPrimerPlaceholder_HV01113631.asp

A second option is to use a third party component to access and manipulate
Word Documents. A quick search turned up this
http://www.csharp-station.com/Articles/WordReports.aspx article that touts
someone's product. I suspect there are many more.

Otherwise you are probably stuck with using the Word automation, which is
terrifyingly slow for some operations (like table manipulation) and requires
the presence of Word installed on the machine. The above referenced article
on WordReports does discuss how to access Word Automation interfaces.

Good luck.

Tom Clement
Serena Software, Inc.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top