Insane filesizes in Word 2k/XP

M

Martin Brown

HELP!! I have tried the MSKB but sadly found nothing obvious.

The situation is that various groups generate technical documents on a
mixture of Win 2k XP platforms. The problem is that under some as yet
undetermined circumstances when small images or illustations are
included the filesize grows in an unbounded manner.

The sort of thing I mean is a minor report 60kb text with 100kb drawings
was rejected by the email system because the Word DOC file was 40MB.

I have established that most of the hit comes from huge OLE data being
included at some point. And then at some other point along the workflow
an incompatibility sometimes occurs that spontaneously doubles the
filesize again.

That is for every included image or picture and for the OLE data there
is a twin created ending with "_". Here is a small example.

eg Directory of C:\qwerty_image_files

25/01/2006 15:58 <DIR> .
25/01/2006 15:58 <DIR> ..
25/01/2006 15:58 234 filelist.xm_
25/01/2006 15:58 234 filelist.xm~
25/01/2006 15:58 1,512 image001.gif
25/01/2006 15:58 2,084 image003.wmz
25/01/2006 15:58 1,512 image004.gi~
25/01/2006 15:58 3,760,186 oledata.ms_
25/01/2006 15:58 3,760,186 oledata.ms~
7 File(s) 7,525,948 bytes

7.5MB for a short document containing one tiny 1500 byte GIF !!!
These odd Word documents contain more than 99.9% wasted space!

I hope that the magic number "3,760,186" is a give-away about the root
cause of this massive explosion in size. I suspect drag & drop...

Properties reports that the offenders claim to be of normal type
"Microsoft Word 97-2002 Document"

Some reports are now reaching 200MB in size despite the fact that their
true information content is under 500kb.

Exporting the entire document to HTML format, then deleting oledata.mso
and oledata.ms_ and opening what is left produces a new file with normal
size but with the original text formatting somewhat mutilated.

I thought I had a solution with a script that deleted and recreated
every image in a document. But for some recent documents this fix no
longer works and they remain stubbornly obese 40MB files with 200kb of
useful content. Deleting *all* the images at once restores normality.

I would be grateful for any pointers where to look next. I cannot
reproduce this malady on my own machines, and I have yet to witness what
it is the authors do to trigger this problem. They claim that nothing
has changed at their end.

Thanks for any pointers or suggestions on what to look for or try next.

Regards,
Martin Brown
 
C

Cindy M -WordMVP-

Hi Martin,
HELP!! I have tried the MSKB but sadly found nothing obvious.

The situation is that various groups generate technical documents on a
mixture of Win 2k XP platforms. The problem is that under some as yet
undetermined circumstances when small images or illustations are
included the filesize grows in an unbounded manner.

The sort of thing I mean is a minor report 60kb text with 100kb drawings
was rejected by the email system because the Word DOC file was 40MB.

I have established that most of the hit comes from huge OLE data being
included at some point. And then at some other point along the workflow
an incompatibility sometimes occurs that spontaneously doubles the
filesize again.
If you insert anything as an OLE object (using Insert/Object) you're going
to get huge file sizes, no question. Inserting as OLE effectively embeds
key parts of the associated application (such as Excel, if the object is an
Excel workbook or chart, for example).

As a general rule, Insert/Object should NEVER be used to incorporate
graphics in a Word document. Insert/Picture/From File should be used. There
are circumstances (poor reproduction) where quality plays a large role,
where exceptions may be made. But in this case you need to decide which is
more important: the file size or the improved quality you get associating
an application to take over the visual rendering. And even then it's not
said that the recipient will get the same result if the application is not
present on his machine.
That is for every included image or picture and for the OLE data there
is a twin created ending with "_". Here is a small example.

eg Directory of C:\qwerty_image_files

25/01/2006 15:58 <DIR> .
25/01/2006 15:58 <DIR> ..
25/01/2006 15:58 234 filelist.xm_
25/01/2006 15:58 234 filelist.xm~
25/01/2006 15:58 1,512 image001.gif
25/01/2006 15:58 2,084 image003.wmz
25/01/2006 15:58 1,512 image004.gi~
25/01/2006 15:58 3,760,186 oledata.ms_
25/01/2006 15:58 3,760,186 oledata.ms~
7 File(s) 7,525,948 bytes

7.5MB for a short document containing one tiny 1500 byte GIF !!!
These odd Word documents contain more than 99.9% wasted space!

I hope that the magic number "3,760,186" is a give-away about the root
cause of this massive explosion in size. I suspect drag & drop...

Properties reports that the offenders claim to be of normal type
"Microsoft Word 97-2002 Document"

Some reports are now reaching 200MB in size despite the fact that their
true information content is under 500kb.

Exporting the entire document to HTML format, then deleting oledata.mso
and oledata.ms_ and opening what is left produces a new file with normal
size but with the original text formatting somewhat mutilated.

I thought I had a solution with a script that deleted and recreated
every image in a document. But for some recent documents this fix no
longer works and they remain stubbornly obese 40MB files with 200kb of
useful content. Deleting *all* the images at once restores normality.

I would be grateful for any pointers where to look next. I cannot
reproduce this malady on my own machines, and I have yet to witness what
it is the authors do to trigger this problem. They claim that nothing
has changed at their end.

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Jun 8 2004)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or
reply in the newsgroup and not by e-mail :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top