HTML natual language search engine

R

Rob Nicholson

I have a folder on a network share containing many HTML documents, in lots
of sub-folders. I'd like to implement a simple search facility whereby the
user can carry out a natural language search on the HTML documents. I'm
guessing the solution would be a robot/crawler service running somewhere and
a search API.

Anyone got any recommendations? Preferably for free or if not, not huge
royalties/license fees :)

I don't need anything hugely complex - just basic cataloging of the HTML
documents and a mechanism to search and return matching documents with
scores. A bonus would be a summary of the document.

Guess I'm looking for a Google type system and API.

Thanks, Rob.
 
R

Rob Nicholson

Guess I'm looking for a Google type system and API.

PS. We use the Windows indexing service and query bot on our intranet
running NT 4. Not very impressed with it's natural language search
facilities in that it sometimes doesn't bring back documents you know are
there and contain the keywords you're entering. Is the indexing service a)
accessible from an application (I think it is) and b) is the indexing
improved beyond NT 4?

Thanks, Rob
 
A

Alan Silver

I've done quite a bit of evaluating for search solutions but no actual
deployments so can't say much about any 'gotchas' but I'd consider the
following may be useful to consider.
<snip>

What I can't understand is that MS are developing so fiercely for the
Internet haven't included something so obvious as a facility for
indexing the final content of a web site, however it was generated. All
it needs is a service that will pull the pages through the web server
and index them. Given the programming power at MS's disposal, this
should have been done years ago.

Just my personal gripe.
 
G

George

MS has an Index server that is indexing Web.
We are using it and it works very well

George.

Alan Silver said:
I've done quite a bit of evaluating for search solutions but no actual
deployments so can't say much about any 'gotchas' but I'd consider the
following may be useful to consider.
<snip>

What I can't understand is that MS are developing so fiercely for the
Internet haven't included something so obvious as a facility for
indexing the final content of a web site, however it was generated. All
it needs is a service that will pull the pages through the web server
and index them. Given the programming power at MS's disposal, this
should have been done years ago.

Just my personal gripe.
 
A

Alan Silver

MS has an Index server that is indexing Web.
We are using it and it works very well

Does it? I thought Index Server only indexed files on your hard disk.
Will IS pull files through the web server before indexing them?
 
G

Guest

I concour that MS is well behind in the indexing and search solutions.
We had to out source because of this.
Why is there nothing in .net 2.0?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top