Is Windows Vista index-based full-text search powerful enough?

P

Peter Frank

Hi,

I have a couple of questions about the new index-based full-text
search of Windows Vista.

1) Is it powerful enough to handle huge amounts of data consisting of
PDF documents, Word, Excel and Powerpoint files (around 20 GB)? Or
would a third-party solution like dtSearch be the better choice? If
this is the better choice, can the indexing by Windows Vista be
disabled?

2) Is there any way I can manage or control the indexing process?
a) Can I set the location of the index files?
b) Can I create multiple indexes?
c) Can I control in any way when the indexing takes place?

3) Can I perform advanced searches using Boolean operators?

Peter
 
D

Dave Wood [MS]

To answer you questions briefly:

- The Windows Search indexer should be able to handle these kinds of
scenarios. If you decide you don't want it to run you need to disable the
Windows Search service.

- You can control what locations are indexed through the Indexing Options
Control Panel, or programatically. We don't currently support multiple
indexes. I think there's some control of when indexing happens
programatically, it depends exactly what scenario you are trying to achieve.
The root of the docs on Windows Search are here:
http://msdn2.microsoft.com/en-us/library/aa965362.aspx

- Yes we support a pretty rich query syntax, an overview of which is here:
http://windowshelp.microsoft.com/Windows/en-US/Help/73106209-6df0-432a-8cb7-df5d8ce02ec61033.mspx

I hope this helps,

Dave Wood
 
G

Guest

This might be a better references. Can it still do document summaries like
2000/XP Index Server.



Advanced Query Syntax
The Advanced Query Syntax (AQS) is used by Microsoft Windows Desktop Search
(WDS) to help users and programmers better define and narrow their searches.
Using AQS is an easy way to narrow searches and deliver better result sets.
Searches can be narrowed by the following parameters:
File kinds: folders, documents, presentations, pictures and so on.
File stores: specific databases and locations.
File properties: size, date, title and so on.
File contents: keywords like "project deliverables," "AQS," "blue suede
shoes," and so on.
Furthermore, search parameters can be combined using search operators. The
remainder of this section explains the query syntax, the parameters and
operators, and how they can be combined to offer targeted search results.
The tables describe the syntax to use with WDS, as well as the properties
that can be queried for each file kind displayed in the Windows Desktop
Search results window.
Desktop Search Syntax
A search query can include one or more keywords, with Boolean operators and
optional criteria. These optional criteria can narrow a search based on the
following:
Scope or data store in which files reside
Kinds of files
Managed properties of files
The optional criteria, described in greater detail following, use the
following syntax:
<scope name>:<value>
<file kind>:<value>
<property name>:<value>
Suppose a user wants to search for a document containing the phase "last
quarter," created by John or Joanne, and that the user saved to the folder
mydocuments. The query may look like this:
"last quarter" author:(john OR joanne) foldername:mydocuments
Scope: Locations and Data Stores
Users can limit the scope of their searches to specific folder locations or
data stores. For example, if you use several e-mail accounts and you want to
limit a query to either Microsoft Outlook or Outlook Express, you can use
store:blush:utlook or store:blush:e respectively.
Restrict Search by Data StoreUseExample
Desktopdesktopstore:desktop
Filesfilesstore:files
Outlookoutlookstore:blush:utlook
Outlook Expressoestore:blush:e
Specific Folderfoldername or infoldername:MyDocuments or in:MyDocuments

If you have a protocol handler in place to crawl custom stores, like Lotus
Notes, you can use the name of the store or protocol handler for the store.
For example, if you implemented a protocol handler to include a Lotus Notes
data store as "notes," the query syntax would be store:notes.
Common File Kinds
Users can also limit their searches to specific types of files, called file
kinds. The following table lists the file kinds and offers examples of the
syntax used to search for these kinds of files.
To Restrict by File Type:UseExample
All file typeseverythingkind:everything
Communicationscommunicationskind:communications
Contactscontactskind:contacts
E-mailemailkind:email
Instant Messenger conversationsimkind:im
Meetingsmeetingskind:meetings
Taskstaskskind:tasks
Notesnoteskind:notes
Documentsdocskind:docs
Text documentstextkind:text
Spreadsheetsspreadsheetskind:spreadsheets
Presentationspresentationskind:presentations
Musicmusickind:music
Picturespicskind:pics
Videosvideoskind:videos
Foldersfolderskind:folders
Folder namefoldername or infoldername:mydocs or in:mydocs
Favoritesfavoriteskind:favorites
Programsprogramskind:programs

Boolean Operators
Search keywords and file properties can be combined to broaden or narrow a
search with operators. The following table explains common operators used in
a search query.
Keyword/SymbolExamplesFunction
NOTsocial NOT securityFinds items that contain social, but not security.
-social -securityFinds items that contain social, but not security.
ORsocial OR securityFinds items that contain social or security.
Quotation marks"social security"Finds items that contain the exact phrase
social security.
Parentheses(social security)Finds items that contain social and security in
any order.
date:>11/05/04
size:>500Finds items with a date after 11/05/04
Finds items with a size greater than 500 bytes.
<date:<11/05/04
size:<500Finds items with a date before 11/05/04
Finds items with a size less than 500 bytes.
...date:11/05/04..11/10/04Finds items with a date beginning on 11/05/04 and
ending on 11/10/04.

Note
The operators NOT and OR must be in uppercase and cannot be combined in one
query (e.g., social OR security NOT retirement).
Boolean Properties
Some file types let users search for files using Boolean properties, as
described in the following table.
PropertyExampleFunction
is:attachmentreport is:attachmentFinds items that have attachments that
contain report. Same as isattachment:true.
isonline:report isonline:trueFinds items that are online and which contain
report.
isrecurring:report isrecurring:trueFinds items that are recurring and which
containreport.
isflagged:report isflagged:trueFinds items that are flagged (Review, Follow
up, for example) and which contain report.
isdeleted:report isdeleted:trueFinds items that are flagged as deleted
(Recycle Bin or Deleted Items, for example) and which contain report.
iscompleted:report iscompleted:falseFinds items that are not flagged as
complete and which contain report.
hasattachment:report hasattachment:trueFinds items containing report and
having attachments
hasflag:report hasflag:trueinds items containing report and having flags.

Dates
In addition to searching on specific dates and date ranges using the
operators described earlier, AQS allows relative date values (like today,
tomorrow, or next week) and day (like Tuesday or Monday..Wednesday) and
month (February) values.
Relative to:Syntax ExampleResult
Daydate:today
date:tomorrow
date:yesterday Finds items with today's date.
Finds items with tomorrow's date.
Finds items with yesterday's date.
Week/Month/yeardate:this week
date:last week
date:next month
date:past month
date:coming year Finds items with a date falling within the current week.
Finds items with a date falling within the previous week.
Finds items with a date falling within the upcoming week.
Finds items with a date falling within the previous month.
Finds items with a date falling within the upcoming year.

Properties by File Kind
Users can search on specific properties of different file kinds. Some
properties (like file size) are common to all files, while others are
limited to a specific kind. Slide count, for example, is specific to
presentations. The following tables list these properties by file kind.
File Kind: Everything
These are properties common to all file kinds. To include all types of files
in a query, the syntax is:
kind:everything <property>:<value>
where <property> is a property listed below and <value> is the
user-specified search term.
PropertyUseExample
Titletitle, subject or abouttitle:"Quarterly Financial"
Statusstatusstatus:complete
Datedatedate:last week
Date modifieddatemodified or modifiedmodified:last week
Importanceimportance or priorityimportance:high
Sizesizesize:> 50
Deleteddeleted or isdeletedisdeleted:true
Is attachmentisattachmentisattachment:true
Toto or tonameto:bob
Cccc or ccnamecc:john
Companycompanycompany:Microsoft
Locationlocationlocation:"Conference Room 102"
Categorycategorycategory:Business
Keywordskeywordskeywords:"sales projections"
Albumalbumalbum:"Fly by Night"
File namefilename or filefilename:MyResume
Genregenregenre:rock
Authorauthor or byauthor:"Stephen King"
Peoplepeople or withwith:(sonja or david)
Folderfolder, under or pathfolder:downloads
File extensionext or fileextext:.txt

Attachment
These are properties common to attachments. To limit the search to
attachments only, the syntax is:
kind:attachment <property>:<value>
where <property> is a property listed below and <value> is the
user-specified search term.
PropertyUseExample
Peoplepeople or withpeople:john or with:john

Contacts
These are properties common to contacts. To limit the search to contacts
only, the syntax is:
kind:contacts <property>:<value>
where <property> is a property listed below and <value> is the
user-specified search term.
PropertyUseExample
Job titlejobtitlejobtitle:CFO
IM addressimaddressimaddress:[email protected]
Assistant's phoneassistantsphoneassistantsphone:555-3323
Assistant nameassistantnameassistantname:paul
Professionprofessionprofession:plumber
Nicknamenicknamenickname:Tex
Spousespousespouse:Debbie
Business citybusinesscitybusinesscity:Seattle
Business postal codebusinesspostalcodebusinesspostalcode:98006
Business home pagebusinesshomepagebusinesshomepage:www.microsoft.com
Callback phone numbercallbackphonenumbercallbackphonenumber:555-555-2121
Car phonecarphonecarphone:555-555-2121
Childrenchildrenchildren:Timmy
First namefirstnamefirstname:John
Last namelastnamelastname:Doe
Home faxhomefaxhomefax:555-555-2121
Manager's namemanagersnamemanagersname:John
Pagerpagerpager:555-555-2121
Business phonebusinessphonebusinessphone:555-555-2121
Home phonehomephonehomephone:555-555-2121
Mobile phonemobilephonemobilephone:555-555-2121
Officeofficeoffice:sample
Anniversaryanniversaryanniversary:1/1/06
Birthdaybirthdaybirthday:1/1/06
Web pagewebpagewebpage:www.microsoft.com

Note
Phone numbers are indexed as entered. For example, if a user did not include
a country or area code when entering the phone number, users will not be
able to locate a contact if searching with country or area code in the phone
number.
Communications
These are properties common to communications. To limit the search to
communications only, the syntax is:
kind:communications <property>:<value>
where <property> is a property listed below and <value> is the
user-specified search term.
PropertyUseExample
Fromfrom or organizerfrom:john
Receivedreceived or sentsent:yesterday
Subjectsubject or titlesubject:"Quarterly Financial"
Has attachmenthasattachments, hasattachmenthasattachment:true
Attachmentsattachments or attachmentattachment:presentation.ppt
Bccbcc, bccname or bccaddressbcc:dave
Cc addressccaddress or ccccaddress:[email protected]
Follow-up flagfollowupflagfollowupflag:2
Due dateduedate or duedue:last week
Readread or isreadis:read
Is completediscompletedis:completed
Incompleteincomplete or isincompleteis:incomplete
Has flaghasflag or isflaggedhas:flag
Durationdurationduration:> 50

Calendar
These are properties common to calendars. To limit the search to calendars
only, the syntax is:
kind:calendar <property>:<value>
where <property> is a property listed below and <value> is the
user-specified search term.
PropertyUseExample
Recurringrecurring or isrecurringis:recurring
Organizerorganizer, by or fromorganizer:debbie

Documents
These are properties common to documents. To limit the search to documents
only, the syntax is:
kind:documents <property>:<value>
where <property> is a property listed below and <value> is the
user-specified search term.
PropertyUseExample
Commentscommentscomments:"needs final review"
Last saved bylastsavedbylastsavedby:john
Document managerdocumentmanagerdocumentmanager:john
Revision numberrevisionnumberrevisionnumber:1.0.3
Document formatdocumentformatdocumentformat:MIMETYPE
Date last printeddatelastprinteddatelastprinted:last week

Presentation
These are properties common to presentations. To limit the search to
presentations only, the syntax is:
kind:presentation <property>:<value>
where <property> is a property listed below and <value> is the
user-specified search term.
PropertyUseExample
Slide countslidecountslidecount:>20

Music
These are properties common to music files. To limit the search to music
only, the syntax is:
kind:music <property>:<value>
where <property> is a property listed below and <value> is the
user-specified search term.
PropertyUseExample
Bit ratebitrate, ratebitrate:192
Artistartist, by or fromartist:John Singer
Durationdurationduration:3
Albumalbumalbum:"greatest hits"
Genregenregenre:rock
Tracktracktrack:12
Yearyearyear:> 1980 < 1990

Picture
These are properties common to pictures. To limit the search to pictures
only, the syntax is:
kind:picture <property>:<value>
where <property> is a property listed below and <value> is the
user-specified search term.
PropertyUseExample
Camera makecameramakecameramake:sample
Camera modelcameramodelcameramodel:sample
Dimensionsdimensionsdimensions:8X10
Orientationorientationorientation:landscape
Date takendatetakendatetaken:yesterday
Widthwidthwidth:1600
Heightheightheight:1200

Video
These are properties common to videos. To limit the search to videos only,
the syntax is:
kind:video <property>:<value>
where <property> is a property listed below and <value> is the
user-specified search term.
PropertyUseExample
Namename, subjectname:"Family Vacation to the Beach 05"
Extext, fileextext:.avi

Related Topics
Perceived Types
WDS Schema
Calling WDS from the Command Line
Calling WDS from Web Pages


Manage Your Profile | Legal | Contact Us | MSDN Flash Newsletter
©2007 Microsoft Corporation. All rights reserved. Terms of Use |
Trademarks | Privacy Statement
 
D

Dennis

Hi,

I have a couple of questions about the new index-based full-text
search of Windows Vista.

1) Is it powerful enough to handle huge amounts of data consisting of
PDF documents, Word, Excel and Powerpoint files (around 20 GB)? Or

I have indexing enabled on my data drive.
It contains approx 44Gb total.
16Gb is documents, mostly pdf's and other smaller files.
The rest is mainly lager files > 10Mb in size.

And it works like a charm :) , that is, when it was done indexing :))

Search result are VERY fast. I'm really impressed, i was kind 'a
expecting to having to turn indexing off for that drive again.

Regards
 
G

Guest

It doesn't do PDFs by default.
Dennis said:
I have indexing enabled on my data drive.
It contains approx 44Gb total.
16Gb is documents, mostly pdf's and other smaller files.
The rest is mainly lager files > 10Mb in size.

And it works like a charm :) , that is, when it was done indexing :))

Search result are VERY fast. I'm really impressed, i was kind 'a
expecting to having to turn indexing off for that drive again.

Regards
 
D

Dennis

It doesn't do PDFs by default.


Nonsense.

It does.

I haven't changed ANY options other than adding the data drive to
indexed locations.

Needless to say Acrobat reader have to be installed.


Regards
 
G

Guest

If one needs to install acrobat for it to work then it DOES NOT do it by
default.
 
D

Dennis

If one needs to install acrobat for it to work then it DOES NOT do it by
default.


Dohh....

The OP asked about if it could index PDF's.
Don't you think it's highly likely that he's got either the Acrobat
reader or Pro installed ??

Sigh.....
 
G

Geta Klew

LOL, let alone the fact that he couldn't open and read them after finding
them if he didn't have a reader ;-0
 
G

Guest

Acrobat 5, which works quickly, can open any PDF I've seen and I see a lot
every day. Installing it doesn't enable indexing of contents.
 
D

Dennis

LOL, let alone the fact that he couldn't open and read them after finding
them if he didn't have a reader ;-0

Yeah.... ha ha

Well one never knows, if there is a way to construct impossible
scenarios, someone will do it just for the heck of it. :)

Regards
 
D

Dennis

Acrobat 5, which works quickly, can open any PDF I've seen and I see a lot
every day. Installing it doesn't enable indexing of contents.

I havent myself checkedmarked PDF in indexing options for file types
to be indexed. I just checked it now, and PDF file type are set to be
indexed with properties AND contents.
I did nothing, but installing Acrobat Reader and adding my drive to
indexed locations.

But, hey, my Vista could be broken :)


Regards
 
D

Dennis

The deault is file properties only. The plain text ilter has no utility with
pdfs.

OMG...

Who is talking about plain text ??

Whatever. Never mind. It wont do it by default, you are right.

My installation just dont know that :)

I think this is the point where i have to rewind my old VHS tapes.
Twice.


Regards
 
P

Peter Frank

Dennis said:
I havent myself checkedmarked PDF in indexing options for file types
to be indexed. I just checked it now, and PDF file type are set to be
indexed with properties AND contents.
I did nothing, but installing Acrobat Reader and adding my drive to
indexed locations.

But, hey, my Vista could be broken :)

I do have Acrobat Reader installed but not the most up-to-date
version. It is not Acrobat 5 but Acrobat 6.0. Will it work with this
version of Acrobat Reader? Or which one is the minimum requirement?

Peter
 
P

Peter Frank

Dave Wood said:
To answer you questions briefly:

- The Windows Search indexer should be able to handle these kinds of
scenarios. If you decide you don't want it to run you need to disable the
Windows Search service.

- You can control what locations are indexed through the Indexing Options
Control Panel, or programatically. We don't currently support multiple
indexes.

OK, that's at least something because there are many locations on my
harddisk which I wouldn't want to be indexed. If it actually indexed
everything from the first to the last partition, it would be very
inefficient.
I think there's some control of when indexing happens
programatically, it depends exactly what scenario you are trying to achieve.

My concern is that re-indexing would slow down my computer
considerably, so I would want Windows to perform indexing only when
the computer is idle. I understand that this could mean that I may
have an obsolete index for some time.
- Yes we support a pretty rich query syntax, an overview of which is here:
http://windowshelp.microsoft.com/Windows/en-US/Help/73106209-6df0-432a-8cb7-df5d8ce02ec61033.mspx

Very good.
I hope this helps,

Yes, it does. Thanks.

Regarding the question about whether I can set where to place the
index files: I conclude that it cannot be done, i.e. the index files
are mandatorily placed on partition C: where Windows Vista is
installed. Is that correct?

Actually, I would prefer to have the index files placed on a different
partition but I suppose this can't be done.

Are there any estimates on how much harddisk space I should reserve
for x GB of documents to be indexed (like 10 % for example, which
would mean I need 1 GB of extra space for every 10 GB of document
data)?
I understand that this depends on the type of data but as I mentioned
before the data locations that I would like to be indexed consist
almost exclusively of PDF documents, Word, Excel, and Powerpoint
files.

Peter
 
P

Peter Frank

Dennis said:
I have indexing enabled on my data drive.
It contains approx 44Gb total.
16Gb is documents, mostly pdf's and other smaller files.
The rest is mainly lager files > 10Mb in size.

And it works like a charm :) , that is, when it was done indexing :))

Search result are VERY fast. I'm really impressed, i was kind 'a
expecting to having to turn indexing off for that drive again.

OK, good to hear. However, what about extensive adding, moving,
copying and deleting of files? My scenario would regularly include a
lot of these operations, so I wonder whether this would slow my
computer considerably due to the re-indexing that Windows Vista would
have to perform. By the way, does Windows Vista re-index immediately
after any changes in a location marked for indexing or only when it is
idle or ...?

Peter
 
D

Dennis

I do have Acrobat Reader installed but not the most up-to-date
version. It is not Acrobat 5 but Acrobat 6.0. Will it work with this
version of Acrobat Reader? Or which one is the minimum requirement?

I don't know. I just got the latest version of Acrobat reader of
Adobe's site. Probably you can find info on compatible versions on
their site...


Regards
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top