D
DC
Hi,
I am programming a search catalogue with 200000 items (and growing). I
am currently using the SQL Server 2000 fulltext engine for this task
but it does not fit the requirements anymore.
The products typically do have a verbose name, "canadian superapples:
red tasty juicy macintosh apple from toronto" and the like. If a
customer is looking for "canadian apple" this product needs to match,
but also if he is looking for "juicy mac".
In SQL Server I am using the clause CONTAINSTABLE(products, name,
'"juicy*" ~ "mac*", 50000) to archieve this but first of all this is
not really fast (I guess the fulltext index service is more into
really long text - or just too slow) and I am running into the
limitation that people search for "green apple" and there are 80.000
products in the database which contain the word "green" in their name,
therefore sql server will come back with an empty result table (since
the fulltext service only digs up to about 20000 items and breaks
thereafter).
I would also love to get some search tolerance, if people enter
"africa banona" it would be great if the algorithm could ignore the
typo.
I am wondering if it is time to implement my own search routine for
that, and that is why I post this message to the csharp and the
sqlserver newsgroup. I am free to use any .net and/or sql server
solution.
I thought it would be a good idea to split the names into single
expressions and use a 1:n relation between the expressions and the
matching products. So I can use a clustered index search on the table
with the expressions. The search for "juicy mac" would then get a lock
on "juicy" which matches 20000 products and macintosh which matches
5000 products and the intersection of the two expressions is the
result list of 3000 products. It turned out, that this solution is
slower than using the fulltext service.
As you can see I am not particulary strong about search algorithms
(database spoiled), so any comments on how to get started would help
me a lot!
Regards
Derek
I am programming a search catalogue with 200000 items (and growing). I
am currently using the SQL Server 2000 fulltext engine for this task
but it does not fit the requirements anymore.
The products typically do have a verbose name, "canadian superapples:
red tasty juicy macintosh apple from toronto" and the like. If a
customer is looking for "canadian apple" this product needs to match,
but also if he is looking for "juicy mac".
In SQL Server I am using the clause CONTAINSTABLE(products, name,
'"juicy*" ~ "mac*", 50000) to archieve this but first of all this is
not really fast (I guess the fulltext index service is more into
really long text - or just too slow) and I am running into the
limitation that people search for "green apple" and there are 80.000
products in the database which contain the word "green" in their name,
therefore sql server will come back with an empty result table (since
the fulltext service only digs up to about 20000 items and breaks
thereafter).
I would also love to get some search tolerance, if people enter
"africa banona" it would be great if the algorithm could ignore the
typo.
I am wondering if it is time to implement my own search routine for
that, and that is why I post this message to the csharp and the
sqlserver newsgroup. I am free to use any .net and/or sql server
solution.
I thought it would be a good idea to split the names into single
expressions and use a 1:n relation between the expressions and the
matching products. So I can use a clustered index search on the table
with the expressions. The search for "juicy mac" would then get a lock
on "juicy" which matches 20000 products and macintosh which matches
5000 products and the intersection of the two expressions is the
result list of 3000 products. It turned out, that this solution is
slower than using the fulltext service.
As you can see I am not particulary strong about search algorithms
(database spoiled), so any comments on how to get started would help
me a lot!
Regards
Derek