I'm posting all of this here in the thread, but I think I'll repost the
whole thing in a new thread, in the hopes of attracting the attention
of those who've stopped paying attention to this particular thread.
I agree. I wouldn't know either. Thinking about it, it might be an
idea to offer the user 2 search options:
1. Search filenames only
2. Search filenames and text in files
Then, if the user chooses option 1 you don't have to open any
file, and search the filenames only, which will give a quicker
result.
I've already implemented it that way. It's very fast, of course, as
it's just the result of a Dir(). I haven't sorted the results,
though. I'd leave that up to someone using the class module (they'd
just have to take the .FilesFound collection and output it to an
array and sort it).
Anyway, I've gone ahead and finished it up. While it's intended as a
replacement for most of the functionality that is found in the
FileSearch object, but some of it I couldn't replicated, or didn't
feel had any value.
Here's the summary from the class module header:
' FileSearch Object Properties and Methods
' PROPERTIES
' Application : irrelevant
' Creator : implemented
' FileName : implemented
' FileType : implemented
' FileTypes : implemented
' FileTypeSpecify : add by DWF
' FoundFiles : implemented
' LastModified : implemented
' LastModifiedSpecify : added by DWF
' LastModifiedSpecifyEnd : added by DWF
' LastModifiedSpecifyStart : added by DWF
' LookIn : implemented
' MatchAllWordForms : not implemented
' MatchTextExactly : implemented
' PropertyTests : not implemented
' SearchFolders : not implemented
' SearchPropertiesOnly : added by DWF
' SearchScopes : not implemented
' SearchSubFolders : implemented
' SearchTextOnly : added by DWF
' TextOrProperty : implemented
'
' METHODS
' Execute : implemented -- extended by DWF
' NewSearch : implemented
' RefreshScopes : not implemented
I could not implement MatchAllWordForms, unfortunately, as that
depends on functionality that is not exposed anywhere that I know
of. I didn't feel that the SearchFolders and SearchScopes
collections collections were worth implementing, as they are aspects
of the FileSearch object as it exists that make it incredibly
complicated to work with. I also didn't implement the PropertyTests
collection as I couldn't see much utility in it. Because I didn't
implement the SearchScopes collection, there was no need to
implement the RefreshScopes method.
BTW, if you're wondering about the reference for the Office
FileSearch object, all you need to do is open the VBAOF11.CHM (or 10
or 12) help file and search for "filesearch." The reference is
pretty complete, and it's what I used to work things out.
I have more comments below, but if you're curious, you can download
it here:
http://dfenton.com/DFA/download/Access/FileSearch.zip
To work as designed, you also need to download the DSO OLE Document
Properties Reader:
http://support.microsoft.com/kb/224351 (version 2.1)
and register the DSOFile.dll. The class module should be able to be
used for plain text searches without needing to install and register
this DLL (I used late binding so that if it's not registered, it
doesn't cause fatal errors for non-property-based searches).
Comments:
I implemented two of the Office enumerations, for MsoLastModified
and for MsoFileType. Both raise significant issues.
MsoLastModified
===============
I don't know how to interpret the meaning of the constants, which
have these names:
msoLastModifiedAnyTime
msoLastModifiedLastMonth
msoLastModifiedLastWeek
msoLastModifiedThisMonth
msoLastModifiedThisWeek
msoLastModifiedToday
msoLastModifiedYesterday
(I also added msoLastModifiedSpecify since it was crucial to
implement additional functionality that I wanted to add)
AnyTime, Today and Yesterday are quite clear, of course. But it's
not clear to me hnow LastMonth, LastWeek, ThisMonth and ThisWeek
should be interpreted. I interpreted them respectively as:
msoLastModifiedLastMonth
: Between DateAdd("m", -2, Date And DateAdd("m", -1, Date
msoLastModifiedLastWeek
: Between DateAdd("ww", -2, Date And DateAdd("ww", -1, Date
msoLastModifiedThisMonth
: Between DateAdd("m", -1, Date And Date()
msoLastModifiedThisWeek
: Between DateAdd("ww", -1, Date And Date()
Another interpretation for the months would be that the current
month would be May, and the previous month would be April, and
likewise with week (current week beginning on Monday, May 25th, and
last week being the week beginning Monday, May 18th).
I don't know which makes more sense at all.
The logic behind it is in the class module's LastModified Property
Let and anyone could easily alter the definitions there to suit
their needs. MS hasn't implemented it consistently in any of the
places I looked, which included the WinXP search companion, Windows
Desktop Search and the Office FileSearch UI -- each one has
different choices and they aren't consistent with the choices
defined for the Office FileSearch object. I didn't test what the
results were, because I didn't have files with appropriate dates to
check on. I figured it's the kind of thing you'll either not use at
all (in which case it doesn't matter how I implement it), or it's
something you'll use but probably want to adjust the interpretation
for your own users. In other words, no matter what I implemented,
it's probably not going to get used in the exact form I did it if
it's used at all.
Anyway, in the sample database, I implemented a combo box of the
enumeration types, while also allowing the user to type in a
specific date. If you enter anything into the combo box other than
one of the predefined choices, the combo box's AfterUpdate event
assumes you want intended to type a specific date, and checks to see
if it's valid. If not, it prompts you to enter a valid date. Once
the valid date is entered, the TO textbox is enabled so you can
enter a range.
MsoFileType
===========
This is an enumeration of all the MS Office file types. It is
documented here:
http://msdn.microsoft.com/es-es/library/microsoft.office.core.msofile
type(VS.80).aspx
(I don't know why I ended up with the Spanish version)
I used that as a guide, but didn't follow it exactly. I also added
the OpenXML document extensions (e.g., xlsx, docx, etc.). For anyone
who uses it, note that the .FileType property handles only the file
types defined in the enumeration, which are:
msoFileTypeAllFiles
msoFileTypeBinders
msoFileTypeCalendarItem
msoFileTypeContactItem
msoFileTypeDatabases
msoFileTypeDataConnectionFiles
msoFileTypeDesignerFiles
msoFileTypeDocumentImagingFiles
msoFileTypeExcelWorkbooks
msoFileTypeJournalItem
msoFileTypeMailItem
msoFileTypeNoteItem
msoFileTypeOfficeFiles
msoFileTypeOutlookItems
msoFileTypePhotoDrawFiles
msoFileTypePowerPointPresentations
msoFileTypeProjectFiles
msoFileTypePublisherFiles
msoFileTypeTaskItem
msoFileTypeTemplates
msoFileTypeVisioFiles
msoFileTypeWebPages
msoFileTypeWordDocuments
I added a custom type:
msoFileTypeUserSpecified
so that the class module had a mechanism for handling other file
types. The .FileTypeSpecify property takes a single extension or a
comma-delimited list of extensions. The extension can be passed with
or without the wildcard character (everything before the . is
ignored). So you could assign "doc" or "*.doc" to the property and
it would have the same result, and likewise, "doc, csv, txt" or
"*.doc, *.csv, *.txt".
As I said, I interpreted the meaning of these constants more broadly
than the documentation said. For instance, there was no Excel
Spreadsheet choice that defined xls, xlt, etc. as the valid file
types, but only the ExcelWorkbooks constant. So I defined that as
being these extensions: xls, xlt, wbk, xlsx, xlsm, xltx, xltm, xlsb,
xlam (I don't know if there is an OpenXML wbk format -- I can't find
wbkx in Google, except as radio station call letters!).
If I wanted to provide a wide selection of file types beyond the
Office file types, it would be better to utilize only the
..FileTypeSpecify property since you could, for instance, provide a
multiselect listbox that would allow the selection of multiple file
types and then simply pass a comma-delimited list to the property. I
implemented the enumeration of Office documents in order to be
consistent with the Office FileSearch object but this somewhat
restricts the use of the .FileType property of the class module.
However, it does insure that the interface is completely consistent
in that regard with the FileSearch object -- as a replacement for
it, it will work fine.
Search File Properties
======================
In order to provide the ability to search OLE document properties, I
utilized an unsupported Microsoft ActiveX DLL, the DSO OLE Document
Properties Reader:
http://support.microsoft.com/kb/224351 (version 2.1)
As I said above, I wrote the class module with late binding so that
if it's not registered, it shouldn't be fatal. Without it, the
property searching doesn't work.
The implementation of this was very complicated, as I had to convert
VB6 code to figure out how it works, and it turned out that the
collection that the object returns (if you browse it with the object
viewer, it's the DSOFile.OleDocumentProperties.SummaryProperties
collection. I was never able to loop through that collection as
opposed to the DSOFile.OleDocumentProperties.CustomProperties
collection, which worked just fine), and had to hard code each
property retrieval one at a time. This resulted in some *very* ugly
code. Here's the code for checking date properties:
Private Function SearchPropertiesDate(dteSearch As Date, _
Optional strPropertyName As String) As Boolean
Dim bolSingleProperty As Boolean
Dim bolTemp As Boolean
Dim dteLastSaved As Date
bolSingleProperty = (Len(strProperty) > 0)
If bolSingleProperty Then
Select Case strProperty
Case "DateLastSaved"
GoTo DateLastSaved
Case "DateCreated"
GoTo DateCreated
Case "DateLastPrinted"
GoTo DateLastPrinted
End Select
End If
If dteSearch = 0 Then GoTo exitRoutine
DateLastSaved:
bolTemp=InStr(DSOFileDoc.SummaryProperties.DateLastSaved,dteSearch)
If bolTemp Or bolSingleProperty Then GoTo exitRoutine
DateCreated:
bolTemp=InStr(DSOFileDoc.SummaryProperties.DateCreated, dteSearch)
If bolTemp Or bolSingleProperty Then GoTo exitRoutine
DateLastPrinted:
bolTemp=InStr(DSOFileDoc.SummaryProperties.DateLastPrinted,
dteSearch) If bolTemp Or bolSingleProperty Then GoTo exitRoutine
exitRoutine:
SearchPropertiesDate = bolTemp
End Function
Let me explain that code.
First, when searching all the properties, I wanted it to short
circuit -- that is, as soon as the first property matched the
criteria, I wanted to return True and exit. This is why
Second, I also wanted to be able to provide the capability of
searching for specific properties. The only way I could think to do
that was with GoTos, which is why it's so incredibly ugly!
But it's all caused by the lack of looping through the collection.
The code sample that came with the download of the DLL also didn't
walk through the SummaryProperties collection, so I doubt that it's
possible. If someone can figure it out, I'd gladly rework that code
to make it less stupid! For now, it does work, though.
Search Results May Not Be Identical to FileSearch Object
=========================================================
I've tried to be as flexible and commonsensical as possible in my
implementation of the searching, but there's a major interaction
between the logic of the class module's searching functionality and
UI. Let me explain.
The main guts of the search process is in a subroutine called
SearchFileForText. In it, first the files are searched for text,
then for last update (through the Access FileDateTime() function),
and then for the OLE properties. Whether or not SearchFileForText
returns true depends on two things:
1. the results of the three searches.
2. the appropriate combination of those three results.
Now, because there was no way for me to have a text search ignore
properties embedded in the header of the document, I made a decision
to allow a choice only between searching Properties Only or Both
Properties and Text. This is the way the FileSearch object actually
implements it, so I would tend to think it's not doing anything
special about reading only certain parts of the OLE stream.
I've tried to implement things in a way that makes sense and that
behaves sensibly, but you all know how that goes -- when you're
heads down programming on a project, you lose sight of the bigger
picture.
I hope that some of you will test it out and find bugs and maybe
figure out how to make it work better or more efficiently.