Two 'Types' in Same Document?

J

John Hanley

I have some Microsoft Word files which show the following attributes in the
same document:

1. Windows Explorer|Folder|Properties|Details: shows "Type" to be
Microsoft Office Word 97-2003 Document with the file extension of .doc

2. From within the document -- Office Button|Prepare|Properties|Document
Properties|Advanced Properties|General: shows "Type: Rich Text Format
Document".

So, I am trying to understand how the same document can have two different
'Type' designations? And which one is the 'real' Type?

Actually what I would really like to do is to use the Vista search function
to find all instances of such 'dual type' documents; any hints as to how I
might do that? Thanks.
 
J

Jay Freedman

John said:
I have some Microsoft Word files which show the following attributes
in the same document:

1. Windows Explorer|Folder|Properties|Details: shows "Type" to be
Microsoft Office Word 97-2003 Document with the file extension of .doc

2. From within the document -- Office
Button|Prepare|Properties|Document Properties|Advanced
Properties|General: shows "Type: Rich Text Format Document".

So, I am trying to understand how the same document can have two
different 'Type' designations? And which one is the 'real' Type?

Actually what I would really like to do is to use the Vista search
function to find all instances of such 'dual type' documents; any
hints as to how I might do that? Thanks.

I can think of at least two ways this might have happened:

- The document was saved with the "Save As Type" setting of "Word 97-2003 &
6.0/95 - RTF (*.doc)".
or
- The document was saved with the "Rich Text Formatting (*.rtf)" setting and
later renamed with a .doc extension.

Windows Explorer determines its "Type" by looking at the extension. Anything
named *.doc is considered to be a Microsoft Office Word 97-2003 Document.

Word, once it loads the document, ignores the extension and instead looks at
the file contents.

- A binary-format Word document starts with the characters (hex) D0 CF 11
E0.
- An RTF document starts with the characters {\rtf1.

If the question makes sense at all, I would say that this is the "real"
type. Once the document is in memory, though, it should be identical
regardless of how it's stored on disk.

I don't think there's any way you can convince Windows Explorer to tell you
when a file is RTF masquerading as a .doc file. If you have some programming
skills, it wouldn't be too hard to write a program -- or even a Word
macro -- to loop through a folder and list the files that have a .doc
extension but whose contents start with the RTF characters.

--
Regards,
Jay Freedman
Microsoft Word MVP
Email cannot be acknowledged; please post all follow-ups to the newsgroup so
all may benefit.
 
J

John Hanley

Jay Freedman said:
I can think of at least two ways this might have happened:

- The document was saved with the "Save As Type" setting of "Word 97-2003
&
6.0/95 - RTF (*.doc)".
or
- The document was saved with the "Rich Text Formatting (*.rtf)" setting
and
later renamed with a .doc extension.

Windows Explorer determines its "Type" by looking at the extension.
Anything
named *.doc is considered to be a Microsoft Office Word 97-2003 Document.

Word, once it loads the document, ignores the extension and instead looks
at
the file contents.

- A binary-format Word document starts with the characters (hex) D0 CF 11
E0.
- An RTF document starts with the characters {\rtf1.

If the question makes sense at all, I would say that this is the "real"
type. Once the document is in memory, though, it should be identical
regardless of how it's stored on disk.

I don't think there's any way you can convince Windows Explorer to tell
you when a file is RTF masquerading as a .doc file. If you have some
programming skills, it wouldn't be too hard to write a program -- or even
a Word macro -- to loop through a folder and list the files that have a
.doc extension but whose contents start with the RTF characters.

--
Regards,
Jay Freedman
Microsoft Word MVP
Email cannot be acknowledged; please post all follow-ups to the newsgroup
so all may benefit.

Thanks Jay. For my next step, I converted one of the 'dual type' files
(save as) to
the Word 2007 format of .docx; the 'duality' is not present in this version.
Then I converted another 'dual' to a (straight) Microsoft Office Word
97-2003 Document (.doc), and the 'duality' is not present. So converting
them to straight .doc or .docxc gets rid of the duality.

Now then -- the reason I am interested in this is as follows: when doing a
search in Vista, a few of my files (known to have a search term within the
contents) would not show in the results for a string within the document; it
turns out that all the exemplars I have of
this were the ones who had the dual .doc/.rtf characteristic. I have now
converted those to straight .doc or .docx files, and behold, search within
their contents now finds the words previously not found. So....I would like
to find any other exemplars of this dual behavior so as to be sure I have
everything searchable; make sense? I can readily see that I have 322
documents with the .doc (not .docx) extension, but unknown of course how
many of those also have the .rtf as well.

Interestingly, I have some straight .rtf Word documents and they search ok
for contents. I also have some other kinds of 'duals' -- some that are
'Microsoft Word 6.0/95' and .doc; these search contents ok.

It just seems to be the ones with that .doc/rtf duality (possibly that is
just an artifact of something else, but it is what I see so far). I took
a brief, random look through the 322 .doc documents, and found a few more
..doc/.rtf duals -- and sure enough none of them would show up in a content
search.

Well, I have traced the problem pretty well, at least I think I know that
those 'dual' type files can
hinder the search, although obviously, I do not know why.
Computers............

Thanks for your input.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top