XML vs. docx file formats

P

Paul

I know Office 2007 has a new default file format, .docx. The only things I
heard about it were how beneficial it would be for large firms, to be able to
quickly and easily aggregate data in such documents automatically. Not being
a large multinational corporation, I am wondering if my large .doc files
(with a few tables and graphics) would be more efficiently handled by saving
my files as XML with Office 2003. I’m not having that much of a problem,
just noticing some delay in displaying the graphics, but when I saved my .doc
as an xml, it displayed the graphics as quickly as text, almost to the
nanosecond. Is this just my imagination, or the benefit of closing out Word
before opening up that file, or is XML really a better file structure for
anything more complicated than text with bold and italics, etc.?

If the answer is that XML really is better for these types of files, would
there be any benefit in my installing the Office 2007 convertibility pack?
Would it make any difference, if I used Office 2003 to save my files as
..docx? Or are these types of files essentially the same? If that’s the case,
I’d pretty much be wasting my time with a convertibility pack then, right?

When saving the document as XML, I have noticed two options to check—save
data only and apply transform. Should these be checked?

--
Paul

MS Office Pro 2003
XP Home SP3
Dell Inspiron 1501
 
G

Graham Mayor

If it ain't broke - don't fix it!

--
<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
Graham Mayor - Word MVP

My web site www.gmayor.com

<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
 
P

Paul

Given that I've asked a slew of questions, I'm not sure which one you are
referring to. You'd advise not using xml file format?
--
Paul

MS Office Pro 2003
XP Home SP3
Dell Inspiron 1501
 
G

Graham Mayor

XML is an answer to a problem that you don't have, so unless there is a
pressing reason to use it that you haven't mentioned - stick with DOC
format.

--
<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
Graham Mayor - Word MVP

My web site www.gmayor.com

<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
 
P

Peter Jamieson

..docx is actually a set of .xml format files and other file types
organised within a .zip file. .docm is the equivalent for documents with
macros.

Word also has (at least) two other .xml formats:
a. the .xml format you can save from Word 2003. This is a single-file
format in that uses Word's "WordProcessingML" vocabulary. That's
probably the one you have been saving to from Word 2003. You can also
save to this format from Word 2007
b. a newer single-file .xml format you can save from Word 2007.

Another main (perceived) benefit of the XML formats is that XML is
inherently more "readable" than the .doc binary format, which means that
if you keep your documents for a very long time, you are less likely to
be left high and dry because nothing understands the .doc binary format
any more. In theory, the XML formats benefit from several things:
a. XML itself is a widely used set of standards
b. XML code is, to an extent, "readable"
c. the .docx format is now an ISO standard

However,
a. not everything in an XML format file is necessarily so esily read.
Some things such as embedded objects may still be in a binary format.
b. although XML is probably a safer bet on the longevity front,
Microsoft starting making the .doc binary format public last year (AIUI
it's still a work in progress), making it more likely that /something/
will be able to read those .doc files in (say) 50 years :)
When saving the document as XML, I have noticed two options to check—save
data only and apply transform. Should these be checked?

No:
a. "Save data only" is used when you are using a "user defined XML
schema" to help structure your document. In a nutshell, your document is
then rather like a form, with areas that you can type in surrounded by
"trim". When you "save data only", Word saves an XML document that
conforms to that user-defined XML schema and contains the data from the
form. It doesn't contain the "trim" and it isn't WordProcessingML
b. "Apply transform" is used if you really want to transform either
your WordProcessingML document or you "user defined XML" document into
another format using an XSLT transfom.

Personally, if I were working in Word 2003, I'd probably save in .doc
unless I had to communicate people who could only open .docx. If I were
working in Word 2007 I'd probably use .docx unless I had to communicate
with people who could only read .doc. Chances are that I'd end up saving
anything I wanted to keep in both formats until .doc starts dying out.


Peter Jamieson

http://tips.pjmsn.me.uk
 
P

Paul

Thanks for your reply. I did install a converter pack for Office 2003 (which
lets me save in the .docx format, but I couldn't see the "single-file" xml
format that you said was in 2007 (evidently you need not the converter pack,
but the Office 2007 software itself).

At any rate, as it looks like more and more of my docs will have graphics
that are making word slow down when saved in the .doc format, I think I'll
save these types of files in the .xml (single-file, 2003) format.

These files are nothing that I need to share with anyone else. They are
personal lecture notes that I am constantly opening and closing throughout
the day, so a few seconds here and there will add up. The .docx format
required several seconds every save (I've been in the habit lately of saving
often, so that is an issue for me).

If I were sharing files of course I'd use .doc but that's not the issue here.
--
Paul

MS Office Pro 2003
XP Home SP3
Dell Inspiron 1501
 
G

grammatim

If you link to your graphics instead of importing them into your
documents, your saves should be faster.
 
P

Paul

I don't think that would work for me. The vast majority of these "graphics"
are screenshots of images of text from Google Books, and I need to see these
because of the text they contain. Also, I'd be in danger if I ever decided to
do some cleaning and moved/deleted them....its just safer to keep them in the
same document.
--
Paul

MS Office Pro 2003
XP Home SP3
Dell Inspiron 1501
 
P

Peter Jamieson

FWIW by the single-file format, I meant the format that your questions
were about:

<<
When saving the document as XML, I have noticed two options to check—save
data only and apply transform. Should these be checked?
Peter Jamieson

http://tips.pjmsn.me.uk
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top