MS Word Document File Structure -- Need Published Description of Structure

  • Thread starter Gray M. Strickland
  • Start date
G

Gray M. Strickland

Can anyone point me to a published description of the MS Word document
(.doc) file structure? I need to write an application which will run over
10,000 Word documents and change a single setting -- WITHOUT using Word or
any of the existing Word dlls. If I could find a clearly published
documentation of the file structure, writing a file parser to change all the
files would be pretty easy.

Gray Strickland
 
J

Jonathan West

Gray M. Strickland said:
Can anyone point me to a published description of the MS Word document
(.doc) file structure? I need to write an application which will run over
any of the existing Word dlls. If I could find a clearly published
documentation of the file structure, writing a file parser to change all
the files would be pretty easy.

Gray Strickland


Microsoft doesn't publish the structure (called the Binary File Format). I
think they do provide it to selected companies under NDA and on the basis of
an agreement not to use the information to produce a clone of Word.

What is the single setting you want to change?
 
G

Gray M. Strickland

Microsoft doesn't publish the structure (called the Binary File Format). I
think they do provide it to selected companies under NDA and on the basis
of an agreement not to use the information to produce a clone of Word.
What is the single setting you want to change?

The value I would change, if I could, is the name and path of the template
associated with the document.

A bug was introduced with Word XP which causes it to be very, very slow when
opening documents for which the associated template no longer exists, has
been moved or renamed. See Microsoft Knowledge Base Article 830561
(Documents that have attached templates take a long time to open in Word
2002 http://support.microsoft.com/default.aspx?scid=kb;en-us;830561 ) and
Microsoft Knowledge Base Article 823372 (Your Word Documents Take a Long
Time to Open When They Have Attached Templates --
http://support.microsoft.com/default.aspx?scid=kb;en-us;823372 ). This was
supposedly fixed in the hotfix described in KB 823372. It wasn't.

My small law firm started ten years ago on Word 6, then upgraded to Word 95,
97, 2000 and we're presently on Word XP. We have over 10,000 documents. As
soon as we moved up to Word XP, it started taking over four minutes per
document to open (if the document's template no longer exists, has been
renamed or moved). Word becomes completely unresponsive while the time to
open the document passes. Apparently, Word is trying to find and open the
original template. Why, I have no idea. I thought the whole idea of a
template was to fill the new document at the time of creation. Once the new
doc is created, there is no reason I can see why the old template is needed
for anything. In 10 years, we have changed the names (twice) of our file
server and have changed the share name and path where templates are stored,
so at least 80% of our documents are affected by this bug.

If I knew the Word document file structure, I'd pay a programmer to write a
file parser to run over all our documents and change the template value of
every single one to NORMAL.DOT. A file parser would not use Word's code, so
it could change all the documents in an hour or two. We wrote a VB routine
which will do this, but it too takes > 4 minutes each to run because it is
using Word's code to open the files. 4 min. X 10,000 files = 40,000 min =
27.78 DAYS. The VB routine we wrote also falls pray to two other problems:
(1) if the file is marked "read only recommended" the routine has to save it
under a new name, then rename the new file to the old name (2) if the file
was created under Word 6.0, the routine tries to save the file it hangs due
to the prompt about saving in Word 6 vs. Word 97/2000/XP format.

Gray Strickland
 
J

Jonathan West

Normally, this could be handled by using DSOFile.DLL to change the document
property, but unfortunately, the attached template is a read-only property
for dsofile.

All I can suggest is that you follow the steps *very* carefully to apply the
patch, to make sure that it takes. Then using Word to change the template
will not take so long.

You might also consider in future installing templates always on the C drive
of each computer. This has the additional advantage that if the server or
the network goes down for any reason, people can still access their
templates and get on with creating documents while the problem is fixed.
 
G

Gray M. Strickland

All I can suggest is that you follow the steps *very* carefully to apply
the patch, to make sure that it takes. Then using Word to change the
template will not take so long.

I was mistaken about the hotfix/patch. We have *not* applied it. (I thought
my consultant had, but...) In fact, we're waiting on MS to respond to a paid
support call requesting the hotfix. For reasons unknown, it is not readily
available for download.

You might also consider in future installing templates always on the C
drive of each computer. This has the additional advantage that if the
server or the network goes down for any reason, people can still access
their templates and get on with creating documents while the problem is
fixed.

User templates are local, but we have no problems there as the only local
templates ever used are normal.dot. All other templates are workgroup
templates and are stored on a file server, which is what, in part, has
caused the problem. Over the years, the file server where the templates are
stored has changed from \\nt1 to \\fred to \\bambam and the share name has
changed as well. If the server name had never changed and if the share name
had never changed, we would not be having this problem now.

CAN ANYONE EXPLAIN why Word even cares about the template after the document
is intially created/populated/filled? Since changes to a template after
document creation are not reflected back to the previously created
documents, and subsequent changes to the documents are not reflected back to
the template, what possible purpose is served by documents maintaing a
connection to the template upon which it was based at creation?
 
J

Jonathan West

Gray M. Strickland said:
I was mistaken about the hotfix/patch. We have *not* applied it. (I
thought my consultant had, but...) In fact, we're waiting on MS to respond
to a paid support call requesting the hotfix. For reasons unknown, it is
not readily available for download.

Make sure that once they acknowledge that the hotfix is what you need, they
do not charge you for that service call.
User templates are local, but we have no problems there as the only local
templates ever used are normal.dot. All other templates are workgroup
templates and are stored on a file server, which is what, in part, has
caused the problem. Over the years, the file server where the templates
are stored has changed from \\nt1 to \\fred to \\bambam and the share name
has changed as well. If the server name had never changed and if the share
name had never changed, we would not be having this problem now.

Likewise, if you store company templates locally, you don't have this
problem. there are any number of ways of pushing updates out to the local
C-drive.
CAN ANYONE EXPLAIN why Word even cares about the template after the
document is intially created/populated/filled? Since changes to a
template after document creation are not reflected back to the previously
created documents, and subsequent changes to the documents are not
reflected back to the template, what possible purpose is served by
documents maintaing a connection to the template upon which it was based
at creation?

Because the attached template stores the following (amongst other things)
which are not copied to the document

- macros
- toolbars
- keyboard customisations
- autotext entries

All of these are things that you may want to use during editing, even after
you have saved and closed the document when you re-open it for more editing.
 
B

Beth Melton

Requesting a hotfix doesn't require paid support.

Just call 1-800-936-4900 (wait for hotfix instructions #3) and tell
them you need to obtain the hotfix described in KB 823372 and they
should set you up in no time.

--
Please post all follow-up questions to the newsgroup. Requests for
assistance by email can not be acknowledged.

~~~~~~~~~~~~~~~
Beth Melton
Microsoft Office MVP

Word FAQ: http://mvps.org/word
TechTrax eZine: http://mousetrax.com/techtrax/
MVP FAQ site: http://mvps.org/
 
G

Gray M. Strickland

I have received the hotfix (had to pay $35 to get it) and *tried* to apply
it. However, it refuses to install because I am running XP Pro Service Pack
2 -- it requires SP 1.
 
S

Suzanne S. Barnhill

If it requires SP 1, that suggests that it was already incorporated in SP 2,
in which case you already have it, and the hotfix is not the real answer to
your problem. Especially if you had to pay for the hotfix (which you are not
supposed to), I would get back to MS and insist on a proper solution.

--
Suzanne S. Barnhill
Microsoft MVP (Word)
Words into Type
Fairhope, Alabama USA

Email cannot be acknowledged; please post all follow-ups to the newsgroup so
all may benefit.
 
B

Beth Melton

Hi Gary,

What was the case number they assigned (starts with SRX)?

If you were charged upfront then it should have been refunded. If it
wasn't then get your case number, call back and ask for a refund.

FWIW I have called the phone number I provided you numerous times,
mainly to verify the process, and have never been asked to pay
upfront, nor have I ever been charged for a hotfix.

--
Please post all follow-up questions to the newsgroup. Requests for
assistance by email can not be acknowledged.

~~~~~~~~~~~~~~~
Beth Melton
Microsoft Office MVP

Word FAQ: http://mvps.org/word
TechTrax eZine: http://mousetrax.com/techtrax/
MVP FAQ site: http://mvps.org/
 
O

Ostrov Dmitry

Dear Gray, contact us at (e-mail address removed) Since we know MS Wor
binary format good enough maybe we can help you, can't promise thoug
 
B

Beth Melton

The reason your case number doesn't start with SRX is because it's a
web case. Is there a reason you created the case on the web rather
than calling the phone number I provided and requesting the hotfix?
Since you created a web case then yes, you would automatically be
charged upfront.

Also, did you contact them and tell them it has been resolved? They
should refund the fee once it's closed. If, of course, your support
request was only for the hotfix.
--
Please post all follow-up questions to the newsgroup. Requests for
assistance by email can not be acknowledged.

~~~~~~~~~~~~~~~
Beth Melton
Microsoft Office MVP

Word FAQ: http://mvps.org/word
TechTrax eZine: http://mousetrax.com/techtrax/
MVP FAQ site: http://mvps.org/
 
G

Gray M. Strickland

We called on the phone. We paid $35. We were told someone would call us
back. After waiting over four days with no call back, I decided to go
through the web site, pay another $35 and see if that route was any better.

I appreciate that you think I was overcharged. I'm not in favor of giving
away money. But at this point, I'd pay a helluva lot more to fix this
problem. I don't know whether MS broke something in Word XP or in XP Pro,
but either way, I have to kill this bug somehow and paying $35 or $70 is
trivial compared to the magnitude of the problem -- at least for me.
 
B

Beth Melton

I did ask someone to check on this and they said it's still considered
open. Until you close it they can't provide a refund.

So if you're interested I'd call back and tell them it's closed.
--
Please post all follow-up questions to the newsgroup. Requests for
assistance by email can not be acknowledged.

~~~~~~~~~~~~~~~
Beth Melton
Microsoft Office MVP

Word FAQ: http://mvps.org/word
TechTrax eZine: http://mousetrax.com/techtrax/
MVP FAQ site: http://mvps.org/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top