HTML natual language search engine

Rob Nicholson · Sep 4, 2005

I have a folder on a network share containing many HTML documents, in lots
of sub-folders. I'd like to implement a simple search facility whereby the
user can carry out a natural language search on the HTML documents. I'm
guessing the solution would be a robot/crawler service running somewhere and
a search API.

Anyone got any recommendations? Preferably for free or if not, not huge
royalties/license fees

I don't need anything hugely complex - just basic cataloging of the HTML
documents and a mechanism to search and return matching documents with
scores. A bonus would be a summary of the document.

Guess I'm looking for a Google type system and API.

Thanks, Rob.

Rob Nicholson · Sep 4, 2005

Guess I'm looking for a Google type system and API.

PS. We use the Windows indexing service and query bot on our intranet
running NT 4. Not very impressed with it's natural language search
facilities in that it sometimes doesn't bring back documents you know are
there and contain the keywords you're entering. Is the indexing service a)
accessible from an application (I think it is) and b) is the indexing
improved beyond NT 4?

Thanks, Rob

Alan Silver · Sep 12, 2005

I've done quite a bit of evaluating for search solutions but no actual

deployments so can't say much about any 'gotchas' but I'd consider the
following may be useful to consider.

<snip>

What I can't understand is that MS are developing so fiercely for the
Internet haven't included something so obvious as a facility for
indexing the final content of a web site, however it was generated. All
it needs is a service that will pull the pages through the web server
and index them. Given the programming power at MS's disposal, this
should have been done years ago.

Just my personal gripe.

George · Sep 12, 2005

MS has an Index server that is indexing Web.
We are using it and it works very well

George.

Alan Silver said:
I've done quite a bit of evaluating for search solutions but no actual
deployments so can't say much about any 'gotchas' but I'd consider the
following may be useful to consider.

<snip>

What I can't understand is that MS are developing so fiercely for the
Internet haven't included something so obvious as a facility for
indexing the final content of a web site, however it was generated. All
it needs is a service that will pull the pages through the web server
and index them. Given the programming power at MS's disposal, this
should have been done years ago.

Just my personal gripe.

Alan Silver · Sep 12, 2005

MS has an Index server that is indexing Web.

We are using it and it works very well

Does it? I thought Index Server only indexed files on your hard disk.
Will IS pull files through the web server before indexing them?

Guest · Oct 29, 2005

I concour that MS is well behind in the indexing and search solutions.
We had to out source because of this.
Why is there nothing in .net 2.0?

WTD: Nvidia Videocard for 4x AGP Slot	0	May 29, 2008
Can't Ping My Own IP Address - Part 2	3	Dec 11, 2005
Asp.net Important Topics.	0	Jan 18, 2007
Help! Explorer.exe crashing due to fault in shell32.dll.	9	Jul 3, 2004
HELP! Explorer.exe crashing because of shell32.dll!	16	Jul 3, 2004
FWT Newsletter - Weekly - November 1, 2004	0	Nov 1, 2004
FWT Newsletter - Weekly - September 6, 2004	1	Sep 6, 2004
FWT Newsletter - Weekly - March 15, 2004	6	Mar 15, 2004

HTML natual language search engine

Rob Nicholson

Rob Nicholson

Alan Silver

George

Alan Silver

Guest

Ask a Question

Similar Threads