Hi Darin!
I would like to check if a document is a Word document. I'm currently just
checking for the "doc" extension but the extension can be changed so
that's
no good.
Any examples or guidance on how to accomplish this?
Of course you could use the Word Object Model to load the file. This would
require Word installed on the target machine and the scalability isn't good.
So forget it.
Another solution would be to use a binary reader and check the file-header,
every Word-file e.g. begins with "D0CF11E0". However, every Excel file as
well. So forget it.
My recommendation is:
If you right-click a Word-file in Explorer (on an NTFS-volume), click on
properties and then on the "file info"-tab you'll find an entry named
"Application Name". For a DOC-file on my machine this entry is "Microsoft
Office Word". We have to retrieve the value in this entry. This entry is
neither there if it is a txt-file (even if it has been created by Word), nor
if it is a file that only has the extension "DOC" but isn't a "Word-file".
All we need to achieve this is a custom wrapper around the OLE Structured
Storage-COM-Interfaces. The VB.NET MVP Eduardo A. Morcillo has already done
this. You can download his library here:
http://www.mvps.org/emorcillo/dotnet/grl/
The library itself is written in VB.NET, but this doesn't matter of course,
as long as it is .NET.
The rest is trivial: install the library, add a reference to "Edanmo's OLE
Storage Classes" and have a look at the following code which I have
commented for you:
using Edanmo.OleStorage;
Storage file;
PropertySetStorage propSet;
PropertyStorage propStg;
//open file
file = new Storage("c:\\test.doc");
//get Property Set Storage
propSet = file.PropertySetStorage;
//properties are divided in 3 groups.
//3 groups are predefined:
//SummaryInformation, DocSummaryInformation and UserProperties
//in this case we want the SummaryInformation
propStg = propSet.Open(PropertySetStorage.FMTID_SummaryInformation);
//read the Application Name-Entry
//the PropertyStorage.SummaryProperty-Enum has to be
//explicitly casted in C#
//
//results on my machine in "Microsoft Office Word"
Debug.WriteLine(
"Application Name: "
+
propStg[(int)PropertyStorage.SummaryProperty.ApplicationName]);
//clean up!
propStg.Close();
propSet.Close();
file.Close();
In your implementation you could simply check if the
propStg[(int)PropertyStorage.SummaryProperty.ApplicationName] is "Microsoft
Office Word" and you're done.
Of course this will work for all other OLE Storage Containers as well
(Excel, PowerPoint, etc.).
Cheers
Arne Janning