Word 2002 - -file formats?

  • Thread starter Thread starter Steve
  • Start date Start date
S

Steve

Does anyone have a detailed description of the file structure (ie what does
Word store where) or format. Even better would be a viewer to translate
the content.

When I look at a .DOC file with a hex viewer, the document text is plainly
seen as are some of the key fields, but there is always large tracts of
data which is not readily interpreted.

What am I upto? I want to be absolutely certain what information I am
handing to a clients when I pass soft copies of word documents.

Best Regards

Steve
 
Hi Steve,

The binary file structure of Office documents is proprietary to Microsoft,
and they license it to third-party developers who demonstrate a business
case. You can find some information on msdn.microsoft.com (look for "OLE
structured storage") but you won't find much enlightenment unless you're an
experienced Windows programmer.

There is, however, plenty of documentation about the kinds of personally
identifiable information stored in Word documents and how to get rid of it.
See the appropriate article for your version of Word:

How to Minimize Metadata in Word
WD97:
http://support.microsoft.com/?kbid=223790
WD2000:
http://support.microsoft.com/?kbid=237361
WD2002:
http://support.microsoft.com/?kbid=290945

Remove Hidden Data add-in for Office 2003 and Office XP
http://support.microsoft.com/?kbid=834427

Protecting Personal Data in Your Word 2003 Documents
http://www.msdn.microsoft.com/library/en-us/odc_wd2003_ta/html/odc_WDProtectWord2003.asp
 
Jay Freedman said:
Hi Steve,

The binary file structure of Office documents is proprietary to Microsoft,
and they license it to third-party developers who demonstrate a business
case. You can find some information on msdn.microsoft.com (look for "OLE
structured storage") but you won't find much enlightenment unless you're an
experienced Windows programmer.

There is, however, plenty of documentation about the kinds of personally
identifiable information stored in Word documents and how to get rid of it.
See the appropriate article for your version of Word:

How to Minimize Metadata in Word
WD97:
http://support.microsoft.com/?kbid=223790
WD2000:
http://support.microsoft.com/?kbid=237361
WD2002:
http://support.microsoft.com/?kbid=290945

Remove Hidden Data add-in for Office 2003 and Office XP
http://support.microsoft.com/?kbid=834427

Protecting Personal Data in Your Word 2003 Documents
http://www.msdn.microsoft.com/library/en-us/odc_wd2003_ta/html/odc_WDProtectWord2003.asp
Thanks Jay,

I've seen those articles, but as you say, MS don't cast any light on the
huge chunk of a .doc file which isn't ascii type content.

I've been playing with Word and switching off "fast saves" appears to stop
the doc "growing". Not done enough work to confirm that though.

I was mailed 1 doc which was MB's big, when I cut and pasted the text to a
new document, it dropped to the 50K mark - so what was hidden in the
originating document?

Apart from security issues, it's a hideous waste of bandwidth.

Best Regards

Steve
 
Steve said:
http://www.msdn.microsoft.com/library/en-us/odc_wd2003_ta/html/odc_WDProtectWord2003.asp
Thanks Jay,

I've seen those articles, but as you say, MS don't cast any light
on the huge chunk of a .doc file which isn't ascii type content.

I've been playing with Word and switching off "fast saves" appears
to stop the doc "growing". Not done enough work to confirm that
though.

I was mailed 1 doc which was MB's big, when I cut and pasted the
text to a new document, it dropped to the 50K mark - so what was
hidden in the originating document?

Apart from security issues, it's a hideous waste of bandwidth.

Best Regards

Steve

Hi Steve,

Continuously growing files are a completely different issue from hidden
personal data. Things that can cause huge files include:

- Fast saves: This just tacks changes onto the end of the file and doesn't
remove anything until a full save occurs. Besides being unnecessary with
current machine speeds, it's a primary cause of document corruption. Turn it
off and leave it off.

- File > Versions. Keeps a copy of each version, so 5 versions = 5 x
document size.

- Embedded graphics, OLE objects, fonts, etc. Not just the size of the
objects -- sometimes pasting in a 100KB picture can increase the document's
file by over 1MB. Use linking instead.

- Tracked changes. Before sending the document, accept or reject all
changes. Otherwise the recipient can see what you changed. Also adds
original+change to file size until accepted/rejected -- especially bad if
you replace graphics.

There are others, but these are the major ones.
 
Back
Top