D
David W. Fenton
Some of you may have been following the thread I started last week
about working with the FileSearch object. Once I found out that it's
been removed from Office 2007, I decided to see if I could replicate
the functionality I needed, and over the holiday weekend and this
evening, I've gradually put together a class module that replaces
almost all the functionality of the Office FileSearch object -- some
of it I couldn't replicated, or I didn't see any value in attempting
to do so.
I'd be very appreciative if people would bang on it and try to break
it, as well as look into some of the complicated issues I discuss
later on -- I need feedback on whether or not I've made the most
logical choices in attempt to replicate some of the more obscure
aspects of the FileSearch object.
Here's the summary from the class module header:
' FileSearch Object Properties and Methods
' PROPERTIES
' Application : irrelevant
' Creator : implemented
' FileName : implemented
' FileType : implemented
' FileTypes : implemented
' FileTypeSpecify : add by DWF
' FoundFiles : implemented
' LastModified : implemented
' LastModifiedSpecify : added by DWF
' LastModifiedSpecifyEnd : added by DWF
' LastModifiedSpecifyStart : added by DWF
' LookIn : implemented
' MatchAllWordForms : not implemented
' MatchTextExactly : implemented
' PropertyTests : not implemented
' SearchFolders : not implemented
' SearchPropertiesOnly : added by DWF
' SearchScopes : not implemented
' SearchSubFolders : implemented
' SearchTextOnly : added by DWF
' TextOrProperty : implemented
'
' METHODS
' Execute : implemented -- extended by DWF
' NewSearch : implemented
' RefreshScopes : not implemented
I could not implement MatchAllWordForms, unfortunately, as that
depends on functionality that is not exposed anywhere that I know
of.
I didn't feel that the SearchFolders and SearchScopes
collections collections were worth implementing, as they are aspects
of the FileSearch object as it exists that make it incredibly
complicated to work with. I also didn't implement the PropertyTests
collection as I couldn't see much utility in it. Because I didn't
implement the SearchScopes collection, there was no need to
implement the RefreshScopes method.
BTW, if you're wondering about a good reference for the Office
FileSearch object, all you need to do is open the VBAOF11.CHM (or 10
or 12) help file and search for "filesearch." It's is
pretty complete, and it's what I used to work things out. I don't
know why the VBE can't pull up that help file when you're coding for
the FileSearch Object, though.
I have more comments, observations and laments below, but if you're
curious right now, you can download it here:
http://dfenton.com/DFA/download/Access/FileSearch.zip
To work as designed, you also need to download the DSO OLE Document
Properties Reader:
http://support.microsoft.com/kb/224351 (version 2.1)
and register the DSOFile.dll. The class module should be able to be
used for plain text searches without needing to install and register
this DLL (I used late binding so that if it's not registered, it
doesn't cause fatal errors for non-property-based searches).
COMMENTS
========
I implemented two of the Office enumerations, for MsoLastModified
and for MsoFileType. Both raise significant questions as to what
they mean and how they should be used, but I implemented them for
consistency with the FileSearch Object.
MsoLastModified
===============
I don't know how to interpret the meaning of the constants, which
have these names:
msoLastModifiedAnyTime
msoLastModifiedLastMonth
msoLastModifiedLastWeek
msoLastModifiedThisMonth
msoLastModifiedThisWeek
msoLastModifiedToday
msoLastModifiedYesterday
(I also added msoLastModifiedSpecify since it was crucial to
implement additional functionality that I wanted to add)
AnyTime, Today and Yesterday are quite clear, of course. But it's
not clear to me hnow LastMonth, LastWeek, ThisMonth and ThisWeek
should be interpreted. I interpreted them respectively as:
msoLastModifiedLastMonth
: Between DateAdd("m", -2, Date And DateAdd("m", -1, Date
msoLastModifiedLastWeek
: Between DateAdd("ww", -2, Date And DateAdd("ww", -1, Date
msoLastModifiedThisMonth
: Between DateAdd("m", -1, Date And Date()
msoLastModifiedThisWeek
: Between DateAdd("ww", -1, Date And Date()
Another interpretation for the months would be that the current
month would be May, and the previous month would be April, and
likewise with week (current week beginning on Monday, May 25th, and
last week being the week beginning Monday, May 18th).
I don't know which makes more sense at all.
The logic behind it is in the class module's LastModified Property
Let and anyone could easily alter the definitions there to suit
their needs. MS hasn't implemented it consistently in any of the
places I looked, which included the WinXP search companion, Windows
Desktop Search and the Office FileSearch UI -- each one has
different choices and they aren't consistent with the choices
defined for the Office FileSearch object. I didn't test what the
results were, because I didn't have files with appropriate dates to
check on. I figured it's the kind of thing you'll either not use at
all (in which case it doesn't matter how I implement it), or it's
something you'll use but probably want to adjust the interpretation
for your own users. In other words, no matter what I implemented,
it's probably not going to get used in the exact form I did it if
it's used at all.
Anyway, in the sample database, I implemented a combo box of the
enumeration types, while also allowing the user to type in a
specific date. If you enter anything into the combo box other than
one of the predefined choices, the combo box's AfterUpdate event
assumes you want intended to type a specific date, and checks to see
if it's valid. If not, it prompts you to enter a valid date. Once
the valid date is entered, the TO textbox is enabled so you can
enter a range.
MsoFileType
===========
This is an enumeration of all the MS Office file types. It is
documented here:
http://msdn.microsoft.com/es-es/library/microsoft.office.core.msofile
type(VS.80).aspx
(I don't know why I ended up with the Spanish version)
I used that as a guide, but didn't follow it exactly. I also added
the OpenXML document extensions (e.g., xlsx, docx, etc.). For anyone
who uses it, note that the .FileType property handles only the file
types defined in the enumeration, which are:
msoFileTypeAllFiles
msoFileTypeBinders
msoFileTypeCalendarItem
msoFileTypeContactItem
msoFileTypeDatabases
msoFileTypeDataConnectionFiles
msoFileTypeDesignerFiles
msoFileTypeDocumentImagingFiles
msoFileTypeExcelWorkbooks
msoFileTypeJournalItem
msoFileTypeMailItem
msoFileTypeNoteItem
msoFileTypeOfficeFiles
msoFileTypeOutlookItems
msoFileTypePhotoDrawFiles
msoFileTypePowerPointPresentations
msoFileTypeProjectFiles
msoFileTypePublisherFiles
msoFileTypeTaskItem
msoFileTypeTemplates
msoFileTypeVisioFiles
msoFileTypeWebPages
msoFileTypeWordDocuments
I added a custom type:
msoFileTypeUserSpecified
so that the class module had a mechanism for handling other file
types. The .FileTypeSpecify property takes a single extension or a
comma-delimited list of extensions. The extension can be passed with
or without the wildcard character (everything before the . is
ignored). So you could assign "doc" or "*.doc" to the property and
it would have the same result, and likewise, "doc, csv, txt" or
"*.doc, *.csv, *.txt".
As I said, I interpreted the meaning of these constants more broadly
than the documentation said. For instance, there was no Excel
Spreadsheet choice that defined xls, xlt, etc. as the valid file
types, but only the ExcelWorkbooks constant. So I defined that as
being these extensions: xls, xlt, wbk, xlsx, xlsm, xltx, xltm, xlsb,
xlam (I don't know if there is an OpenXML wbk format -- I can't find
wbkx in Google, except as radio station call letters!).
Note also that for now, I've commented out all the extensions for
template files except for the msoFileTypeTemplates file type.
If in an application you need to provide a selection of file types
beyond the Office file types, it would be better to utilize only the
..FileTypeSpecify property since you could, for instance, provide a
multiselect listbox that would allow the selection of multiple file
types and then simply pass a comma-delimited list to the property. I
implemented the enumeration of Office documents in order to be
consistent with the Office FileSearch object but this somewhat
restricts the use of the .FileType property of the class module.
However, it does insure that the interface is completely consistent
in that regard with the FileSearch object -- as a replacement for
it, it will work fine.
But in general, I suspect that it will be more convenient to use the
..FileTypeSpecify property and use a table of file types to populate
a list box (or combo box) for the user to choose from.
Search File Properties
======================
In order to provide the ability to search OLE document properties, I
utilized an unsupported Microsoft ActiveX DLL, the DSO OLE Document
Properties Reader:
http://support.microsoft.com/kb/224351 (version 2.1)
As I said above, I wrote the class module with late binding so that
if it's not registered, it shouldn't be fatal. Without it, the
property searching doesn't work.
The implementation of this was very complicated, as I had to convert
VB6 code to figure out how it works, and it turned out that the
collection that the object returns (if you browse it with the object
viewer, it's the DSOFile.OleDocumentProperties.SummaryProperties
collection) is not loopable (as opposed to the
DSOFile.OleDocumentProperties.CustomProperties collection, which
worked just fine). I had to hard code each property retrieval one at
a time and that resulted in some *very* ugly code. Here's the code
for checking date properties:
Private Function SearchPropertiesDate(dteSearch As Date, _
Optional strPropertyName As String) As Boolean
Dim bolSingleProperty As Boolean
Dim bolTemp As Boolean
Dim dteLastSaved As Date
bolSingleProperty = (Len(strProperty) > 0)
If bolSingleProperty Then
Select Case strProperty
Case "DateLastSaved"
GoTo DateLastSaved
Case "DateCreated"
GoTo DateCreated
Case "DateLastPrinted"
GoTo DateLastPrinted
End Select
End If
If dteSearch = 0 Then GoTo exitRoutine
DateLastSaved:
bolTemp=InStr(DSOFileDoc.SummaryProperties.DateLastSaved,dteSearch)
If bolTemp Or bolSingleProperty Then GoTo exitRoutine
DateCreated:
bolTemp=InStr(DSOFileDoc.SummaryProperties.DateCreated, dteSearch)
If bolTemp Or bolSingleProperty Then GoTo exitRoutine
DateLastPrinted:
bolTemp=InStr(DSOFileDoc.SummaryProperties.DateLastPrinted,
dteSearch) If bolTemp Or bolSingleProperty Then GoTo exitRoutine
exitRoutine:
SearchPropertiesDate = bolTemp
End Function
Let me explain that code.
First, when searching all the properties, I wanted it to short
circuit -- that is, as soon as the first property matched the
criteria, I wanted to return True and exit. This is why it has all
those tests and "oTo exitRoutine" jumps -- it's simply that there's
no utility in knowing that two properties matched the same search
string.
Second, I also wanted to be able to provide the capability of
searching for specific properties. The only way I could think to do
that was with GoTos, which is why it's so incredibly ugly!
But it's all caused by the lack of looping through the collection.
The code sample that came with the download of the DLL also didn't
walk through the SummaryProperties collection, so I doubt that it's
possible. If someone can figure it out, I'd gladly rework that code
to make it less stupid! For now, it does work, though.
Search Results May Not Be Identical to FileSearch Object
=========================================================
I've tried to be as flexible and commonsensical as possible in my
implementation of the searching, but there's a major interaction
between the logic of the class module's searching functionality and
UI. Let me explain.
The main guts of the search process is in a subroutine called
SearchFileForText. In it, first the files are searched for text,
then for last update (through the Access FileDateTime() function),
and then for the OLE properties. Whether or not SearchFileForText
returns true depends on two things:
1. the results of the three searches.
2. the appropriate combination of those three results.
Now, because there was no way for me to have a text search ignore
properties embedded in the header of the document, I made a decision
to allow a choice only between searching Properties Only or Both
Properties and Text. This is the way the FileSearch object actually
implements it so far as I can tell (the UI in Office has never given
you the choice of searching text only, just both or properties
only), so I would tend to think it's not doing anything special
about reading only certain parts of the OLE stream.
I've tried to implement things in a way that makes sense and that
behaves sensibly, but you all know how that goes -- when you're
heads down programming on a project, you lose sight of the bigger
picture.
I hope that some of you will test it out and find bugs and maybe
figure out how to make it work better or more efficiently.
about working with the FileSearch object. Once I found out that it's
been removed from Office 2007, I decided to see if I could replicate
the functionality I needed, and over the holiday weekend and this
evening, I've gradually put together a class module that replaces
almost all the functionality of the Office FileSearch object -- some
of it I couldn't replicated, or I didn't see any value in attempting
to do so.
I'd be very appreciative if people would bang on it and try to break
it, as well as look into some of the complicated issues I discuss
later on -- I need feedback on whether or not I've made the most
logical choices in attempt to replicate some of the more obscure
aspects of the FileSearch object.
Here's the summary from the class module header:
' FileSearch Object Properties and Methods
' PROPERTIES
' Application : irrelevant
' Creator : implemented
' FileName : implemented
' FileType : implemented
' FileTypes : implemented
' FileTypeSpecify : add by DWF
' FoundFiles : implemented
' LastModified : implemented
' LastModifiedSpecify : added by DWF
' LastModifiedSpecifyEnd : added by DWF
' LastModifiedSpecifyStart : added by DWF
' LookIn : implemented
' MatchAllWordForms : not implemented
' MatchTextExactly : implemented
' PropertyTests : not implemented
' SearchFolders : not implemented
' SearchPropertiesOnly : added by DWF
' SearchScopes : not implemented
' SearchSubFolders : implemented
' SearchTextOnly : added by DWF
' TextOrProperty : implemented
'
' METHODS
' Execute : implemented -- extended by DWF
' NewSearch : implemented
' RefreshScopes : not implemented
I could not implement MatchAllWordForms, unfortunately, as that
depends on functionality that is not exposed anywhere that I know
of.
I didn't feel that the SearchFolders and SearchScopes
collections collections were worth implementing, as they are aspects
of the FileSearch object as it exists that make it incredibly
complicated to work with. I also didn't implement the PropertyTests
collection as I couldn't see much utility in it. Because I didn't
implement the SearchScopes collection, there was no need to
implement the RefreshScopes method.
BTW, if you're wondering about a good reference for the Office
FileSearch object, all you need to do is open the VBAOF11.CHM (or 10
or 12) help file and search for "filesearch." It's is
pretty complete, and it's what I used to work things out. I don't
know why the VBE can't pull up that help file when you're coding for
the FileSearch Object, though.
I have more comments, observations and laments below, but if you're
curious right now, you can download it here:
http://dfenton.com/DFA/download/Access/FileSearch.zip
To work as designed, you also need to download the DSO OLE Document
Properties Reader:
http://support.microsoft.com/kb/224351 (version 2.1)
and register the DSOFile.dll. The class module should be able to be
used for plain text searches without needing to install and register
this DLL (I used late binding so that if it's not registered, it
doesn't cause fatal errors for non-property-based searches).
COMMENTS
========
I implemented two of the Office enumerations, for MsoLastModified
and for MsoFileType. Both raise significant questions as to what
they mean and how they should be used, but I implemented them for
consistency with the FileSearch Object.
MsoLastModified
===============
I don't know how to interpret the meaning of the constants, which
have these names:
msoLastModifiedAnyTime
msoLastModifiedLastMonth
msoLastModifiedLastWeek
msoLastModifiedThisMonth
msoLastModifiedThisWeek
msoLastModifiedToday
msoLastModifiedYesterday
(I also added msoLastModifiedSpecify since it was crucial to
implement additional functionality that I wanted to add)
AnyTime, Today and Yesterday are quite clear, of course. But it's
not clear to me hnow LastMonth, LastWeek, ThisMonth and ThisWeek
should be interpreted. I interpreted them respectively as:
msoLastModifiedLastMonth
: Between DateAdd("m", -2, Date And DateAdd("m", -1, Date
msoLastModifiedLastWeek
: Between DateAdd("ww", -2, Date And DateAdd("ww", -1, Date
msoLastModifiedThisMonth
: Between DateAdd("m", -1, Date And Date()
msoLastModifiedThisWeek
: Between DateAdd("ww", -1, Date And Date()
Another interpretation for the months would be that the current
month would be May, and the previous month would be April, and
likewise with week (current week beginning on Monday, May 25th, and
last week being the week beginning Monday, May 18th).
I don't know which makes more sense at all.
The logic behind it is in the class module's LastModified Property
Let and anyone could easily alter the definitions there to suit
their needs. MS hasn't implemented it consistently in any of the
places I looked, which included the WinXP search companion, Windows
Desktop Search and the Office FileSearch UI -- each one has
different choices and they aren't consistent with the choices
defined for the Office FileSearch object. I didn't test what the
results were, because I didn't have files with appropriate dates to
check on. I figured it's the kind of thing you'll either not use at
all (in which case it doesn't matter how I implement it), or it's
something you'll use but probably want to adjust the interpretation
for your own users. In other words, no matter what I implemented,
it's probably not going to get used in the exact form I did it if
it's used at all.
Anyway, in the sample database, I implemented a combo box of the
enumeration types, while also allowing the user to type in a
specific date. If you enter anything into the combo box other than
one of the predefined choices, the combo box's AfterUpdate event
assumes you want intended to type a specific date, and checks to see
if it's valid. If not, it prompts you to enter a valid date. Once
the valid date is entered, the TO textbox is enabled so you can
enter a range.
MsoFileType
===========
This is an enumeration of all the MS Office file types. It is
documented here:
http://msdn.microsoft.com/es-es/library/microsoft.office.core.msofile
type(VS.80).aspx
(I don't know why I ended up with the Spanish version)
I used that as a guide, but didn't follow it exactly. I also added
the OpenXML document extensions (e.g., xlsx, docx, etc.). For anyone
who uses it, note that the .FileType property handles only the file
types defined in the enumeration, which are:
msoFileTypeAllFiles
msoFileTypeBinders
msoFileTypeCalendarItem
msoFileTypeContactItem
msoFileTypeDatabases
msoFileTypeDataConnectionFiles
msoFileTypeDesignerFiles
msoFileTypeDocumentImagingFiles
msoFileTypeExcelWorkbooks
msoFileTypeJournalItem
msoFileTypeMailItem
msoFileTypeNoteItem
msoFileTypeOfficeFiles
msoFileTypeOutlookItems
msoFileTypePhotoDrawFiles
msoFileTypePowerPointPresentations
msoFileTypeProjectFiles
msoFileTypePublisherFiles
msoFileTypeTaskItem
msoFileTypeTemplates
msoFileTypeVisioFiles
msoFileTypeWebPages
msoFileTypeWordDocuments
I added a custom type:
msoFileTypeUserSpecified
so that the class module had a mechanism for handling other file
types. The .FileTypeSpecify property takes a single extension or a
comma-delimited list of extensions. The extension can be passed with
or without the wildcard character (everything before the . is
ignored). So you could assign "doc" or "*.doc" to the property and
it would have the same result, and likewise, "doc, csv, txt" or
"*.doc, *.csv, *.txt".
As I said, I interpreted the meaning of these constants more broadly
than the documentation said. For instance, there was no Excel
Spreadsheet choice that defined xls, xlt, etc. as the valid file
types, but only the ExcelWorkbooks constant. So I defined that as
being these extensions: xls, xlt, wbk, xlsx, xlsm, xltx, xltm, xlsb,
xlam (I don't know if there is an OpenXML wbk format -- I can't find
wbkx in Google, except as radio station call letters!).
Note also that for now, I've commented out all the extensions for
template files except for the msoFileTypeTemplates file type.
If in an application you need to provide a selection of file types
beyond the Office file types, it would be better to utilize only the
..FileTypeSpecify property since you could, for instance, provide a
multiselect listbox that would allow the selection of multiple file
types and then simply pass a comma-delimited list to the property. I
implemented the enumeration of Office documents in order to be
consistent with the Office FileSearch object but this somewhat
restricts the use of the .FileType property of the class module.
However, it does insure that the interface is completely consistent
in that regard with the FileSearch object -- as a replacement for
it, it will work fine.
But in general, I suspect that it will be more convenient to use the
..FileTypeSpecify property and use a table of file types to populate
a list box (or combo box) for the user to choose from.
Search File Properties
======================
In order to provide the ability to search OLE document properties, I
utilized an unsupported Microsoft ActiveX DLL, the DSO OLE Document
Properties Reader:
http://support.microsoft.com/kb/224351 (version 2.1)
As I said above, I wrote the class module with late binding so that
if it's not registered, it shouldn't be fatal. Without it, the
property searching doesn't work.
The implementation of this was very complicated, as I had to convert
VB6 code to figure out how it works, and it turned out that the
collection that the object returns (if you browse it with the object
viewer, it's the DSOFile.OleDocumentProperties.SummaryProperties
collection) is not loopable (as opposed to the
DSOFile.OleDocumentProperties.CustomProperties collection, which
worked just fine). I had to hard code each property retrieval one at
a time and that resulted in some *very* ugly code. Here's the code
for checking date properties:
Private Function SearchPropertiesDate(dteSearch As Date, _
Optional strPropertyName As String) As Boolean
Dim bolSingleProperty As Boolean
Dim bolTemp As Boolean
Dim dteLastSaved As Date
bolSingleProperty = (Len(strProperty) > 0)
If bolSingleProperty Then
Select Case strProperty
Case "DateLastSaved"
GoTo DateLastSaved
Case "DateCreated"
GoTo DateCreated
Case "DateLastPrinted"
GoTo DateLastPrinted
End Select
End If
If dteSearch = 0 Then GoTo exitRoutine
DateLastSaved:
bolTemp=InStr(DSOFileDoc.SummaryProperties.DateLastSaved,dteSearch)
If bolTemp Or bolSingleProperty Then GoTo exitRoutine
DateCreated:
bolTemp=InStr(DSOFileDoc.SummaryProperties.DateCreated, dteSearch)
If bolTemp Or bolSingleProperty Then GoTo exitRoutine
DateLastPrinted:
bolTemp=InStr(DSOFileDoc.SummaryProperties.DateLastPrinted,
dteSearch) If bolTemp Or bolSingleProperty Then GoTo exitRoutine
exitRoutine:
SearchPropertiesDate = bolTemp
End Function
Let me explain that code.
First, when searching all the properties, I wanted it to short
circuit -- that is, as soon as the first property matched the
criteria, I wanted to return True and exit. This is why it has all
those tests and "oTo exitRoutine" jumps -- it's simply that there's
no utility in knowing that two properties matched the same search
string.
Second, I also wanted to be able to provide the capability of
searching for specific properties. The only way I could think to do
that was with GoTos, which is why it's so incredibly ugly!
But it's all caused by the lack of looping through the collection.
The code sample that came with the download of the DLL also didn't
walk through the SummaryProperties collection, so I doubt that it's
possible. If someone can figure it out, I'd gladly rework that code
to make it less stupid! For now, it does work, though.
Search Results May Not Be Identical to FileSearch Object
=========================================================
I've tried to be as flexible and commonsensical as possible in my
implementation of the searching, but there's a major interaction
between the logic of the class module's searching functionality and
UI. Let me explain.
The main guts of the search process is in a subroutine called
SearchFileForText. In it, first the files are searched for text,
then for last update (through the Access FileDateTime() function),
and then for the OLE properties. Whether or not SearchFileForText
returns true depends on two things:
1. the results of the three searches.
2. the appropriate combination of those three results.
Now, because there was no way for me to have a text search ignore
properties embedded in the header of the document, I made a decision
to allow a choice only between searching Properties Only or Both
Properties and Text. This is the way the FileSearch object actually
implements it so far as I can tell (the UI in Office has never given
you the choice of searching text only, just both or properties
only), so I would tend to think it's not doing anything special
about reading only certain parts of the OLE stream.
I've tried to implement things in a way that makes sense and that
behaves sensibly, but you all know how that goes -- when you're
heads down programming on a project, you lose sight of the bigger
picture.
I hope that some of you will test it out and find bugs and maybe
figure out how to make it work better or more efficiently.