searching for text in a field - better way?

  • Thread starter Thread starter Tim Zych
  • Start date Start date
T

Tim Zych

Hi. I have a database with search capability (Asp.Net, Access 2000). Type is
a few words and get results using the following approach:

SELECT tblCode.CodeID, tblCode.Title FROM tblCode WHERE tblCode.Code Like
'%word1%' And tblCode.Code Like '%word2%';

where Code is a non indexed memo field.

I am wondering, is there a better approach than using the wildcards? Since
this is being performed on the entire memo column, I would imagine this
might get quite slow on many rows.

It works OK for now, but I would like to find a more robust way.

Any suggestions?

Thanks.
 
Hi. I have a database with search capability (Asp.Net, Access 2000). Type is
a few words and get results using the following approach:

SELECT tblCode.CodeID, tblCode.Title FROM tblCode WHERE tblCode.Code Like
'%word1%' And tblCode.Code Like '%word2%';

where Code is a non indexed memo field.

I am wondering, is there a better approach than using the wildcards? Since
this is being performed on the entire memo column, I would imagine this
might get quite slow on many rows.

It works OK for now, but I would like to find a more robust way.

Any suggestions?

Thanks.

If you have a many (records) to many (codes) relationship, it's
probably best to model it as a many to many relationship with three
tables:

tblCode
CodeID Primary Key
Title

tblKeywords
Keyword Text Primary Key

tblCodeKeywords
CodeID <foreign key to tblCode>
Keyword <foreign key to tblKeywords>
<joint two field primary key>

Searching this structure takes advantage of the index on
tblCodeKeywords.Keyword.

AND logic for multiple keywords gets more complex however:

SELECT tblCode.Title, tblCOde.CodeID
FROM tblCode
INNER JOIN (tblCodeKeywords AS K1 ON K1.CodeID = tblCode.CodeID)
INNER JOIN tblCodeKeywords AS K2 on K2.CodeID = tblCode.CodeID
WHERE K1.Keyword = "WORD1"
AND K2.Keyword = "WORD2"

John W. Vinson[MVP]
 
So I must parse out the keywords every time I change the Code text right?

E.g. every time I add/update that column, must I update tblKeywords and
tblCodeKeywords. (tblKeywords consists of unique values among all code
entries while tblCodeKeywords consists of unique values for just that code?)

Also the query does not "compile". (Syntax error in JOIN operation).

Thaks for your help.
 
So I must parse out the keywords every time I change the Code text right?

Well... if you're storing multiple distinct atomic values in this Code
field, you're violating the first principle of relational database
design: all fields should be atomic, you should store only one value
in a field.

If you choose to violate this principle, you are causing yourself this
problem. There are various workarounds to the problem ( the non
indexed LIKE, or storing data redundantly in the memo field and the
Keywords table) but the root of the problem is the non-normalized
multivalue memo field.
E.g. every time I add/update that column, must I update tblKeywords and
tblCodeKeywords. (tblKeywords consists of unique values among all code
entries while tblCodeKeywords consists of unique values for just that code?)

Right - tblCodeKeywords is a list of all of the keywords which apply
to that particular code.
Also the query does not "compile". (Syntax error in JOIN operation).

I probably got the parentheses wrong; try joining the tblCodeKeywords
table to the Codes table twice, once for each criterion.

I *think* I have a decent picture of the real world situation you're
trying to solve, but if you could describe it, some third solution
might come to mind!

John W. Vinson[MVP]
 
John:

This is a VBA function / code library. The memo field Code contains the VBA
functions and procedures. To find functions, the website performs a text
search within the functions themselves. E.g. I might search for "array" and
"unique" to find all of the functions where both of those words exist.

Based on this new (my apologies for not being more informative up front)
information, would your original approach still apply?

Thanks for the help

Tim
 
John:

This is a VBA function / code library. The memo field Code contains the VBA
functions and procedures. To find functions, the website performs a text
search within the functions themselves. E.g. I might search for "array" and
"unique" to find all of the functions where both of those words exist.

Based on this new (my apologies for not being more informative up front)
information, would your original approach still apply?

Well... only if you need it for performance. Since the Memo field
contains a valid, "atomic" entity - the block of code - my concern
that you had a bunch of keywords dumped into a comma- or
blank-separated textbox evaporates.

If you can get acceptable performance searching the memo field using
wildcards, that may be your best bet, since extracting keywords and
maintaining the (redundant, at the word level) tables would be a big
overhead cost.

John W. Vinson[MVP]
 
It might be fun to see the difference.

I do a lot of text searches, way more than writing to the db. I agree there
would be initial overhead to write and index each new entry or modification,
but I am curious to see the difference in search performance.

Thanks.
 
It might be fun to see the difference.

I do a lot of text searches, way more than writing to the db. I agree there
would be initial overhead to write and index each new entry or modification,
but I am curious to see the difference in search performance.

There are commercially available "textbase" programs (including, IIRC,
add-ons to SQL/Server and Oracle). They tend to be complex, pricey,
and very, very good at searching huge volumes of text. The ones I've
seen (some years back) had neat features like optional automatic
grammatical expansion (e.g. searching for "catch" retrieves records
containing "caught" - in several languages yet!); WITHIN clauses, so
that you can find two words within the same line, sentence, or
paragraph; KWIC, Key Word In Context, and so on.

I'm sure that we could all learn something from Google though!

John W. Vinson[MVP]
 
My needs are relatively simple :)

The wildcard search works pretty well though. I got the search part up an
running so quickly using Like that I figured I must have cheated and that I
should consider a more robust approach as the code library expands or if
others use it.

Thanks for the advice. I always learn something from your posts.

Tim
 
Back
Top