Searching for best matches in string query

W

Wes

I need to return a list of best/ranked matches from a field in a table. For
example, if the user searches for "Bob Jones Roofing Company", I'd like it to
return results that may contain all or any of those words, which the best
matches listed first. In that example, "Jones Roofing Company"(3 hits)
should come up before "Bob Smiths Plumbing Company"(2 hits).

I'm not sure of the best approach to accomplish this and I haven't been able
to word my google search term well enough to get meaningful results.

Any help is appreciated.

Thanks.
 
J

Jeff Boyce

Wes

If you were to hire someone to look over your lists and rate/rank the
matches, what instructions would you give them? How do you want them to
decide that the words are "close enough" or not?

What if you had "Robert Jones Woofing Co." as one choice? Is that likely a
close match or something totally different?

Access is much more stupid than that person you hired. You'll have to tell
Access even more about how to decide...

Regards

Jeff Boyce
Microsoft Office/Access MVP
 
W

Wes

Thanks for the reply.

I'm willing to accept exact word matches only. However, I need the results
that have the greatest number of matches. If a user mispells a word or
inputs a similar word, they'll have to account for it when reports run.

I'm considering breaking the search string into an array of words, reading
the entire table into a second array and then steping through this data array
using InStr to determine if the word array items exist. If they do, I'll
re-sort the array, moving matches up and misses down. After stepping through
the data array searching for each item in the word array, I'm hopeful I'll
have the items with the most hits at the top.

Unfortunately, I can see that this is going to be extremely process
intensive and am hopeful there is a better way. Fortunately, I don't think
the data array will grow very fast.
 
L

louisjohnphillips

Thanks for the reply.

I'm willing to accept exact word matches only.  However, I need the results
that have the greatest number of matches.  If a user mispells a word or
inputs a similar word, they'll have to account for it when reports run.

I'm considering breaking the search string into an array of words, reading
the entire table into a second array and then steping through this data array
using InStr to determine if the word array items exist.  If they do, I'll
re-sort the array, moving matches up and misses down.  After stepping through
the data array searching for each item in the word array, I'm hopeful I'll
have the items with the most hits at the top.

Unfortunately, I can see that this is going to be extremely process
intensive and am hopeful there is a better way.  Fortunately, I don't think
the data array will grow very fast.









- Show quoted text -

Your post said "any" help so here's my submission. It is untested and
you should consider it a starting point rather than a definitive
answer.

One way to do this involves a master table for each business:

Create table BusinessMaster
(
BusinessID long,
BusinessName text(50)
)

Load this table with the list of businesses.


A second table would be for BusinessTokens:

Create table BusinessTokens
(
BusinessID long,
BusinessToken Text(50)
)


Make a copy of BusinessMaster called BusinessesWork

SELECT *
INTO BusinessesWork
from BusinessMaster;


Run a process that inserts into BusinessTokens those tokens found in
BusinessMaster.

INSERT INTO BusinessTokens( BusinessID, BusinessToken )
SELECT BusinessID, left( BusinessName, instr( BusinessName,' ' ) -1 )
from BusinessWork
where instr( BusinessName, ' ' ) > 0 ;

Eliminate the token that was just loaded into BusinessTokens

Update BusinessWork
set BusinessName = mid( BusinessName, instr( BusinessName, ' ' ) +
1 ) ;

Run the "INSERT INTO" and "UPDATE" statements again until it the
INSERT Statement fails to yield results.

Modify the INSERT statement to read:

INSERT INTO BusinessTokens( BusinessID, BusinessToken )
SELECT BusinessID, BusinessName
from BusinessWork
where len( BusinessName ) > 0;

These tables were needed to do the work. So try this:

SELECT C.BusinessName, count(*)
from BusinessTokens as A,
BusinessNames as B,
BusinessNames as C
where instr( B.BusinessName, A.BusinessToken ) > 0
and B.BusinessName = [Enter Business Name]
and A.BusinessID = C.BusinessID
Group by C.BusinessName
ORDER BY 2

The B.BusinessName remains constant. It joins to every row in
BusinessTokens where the token if found in the BusinessName. Those
matching rows are projected from the C.BusinessName.
 
J

John Spencer

One method would involve using a temp table that is populated with just
the words you are looking for. You would have to clear the records and
create new records each time you run the query.

TableFindWords with field TheWord
Bob
Jones
Roofing
Company

SELECT YourTable.Company, Count(YourTable.Company) as TimesFound
FROM YourTable INNER JOIN TableFindWords
ON YourTable.Company LIKE "*" & TableFindWords.TheWord & "*"
GROUP BY YourTable.Company
Order By Count(YourTable.Company), YourTable.Company


I'll have to ponder if there are other methods that you could use.

'====================================================
John Spencer
Access MVP 2002-2005, 2007-2008
The Hilltop Institute
University of Maryland Baltimore County
'====================================================
 
W

Wes

I think I figured it out. I broke the search string into an array, then
looped through my recordset as an array counting the matches. I then sorted
on the match count, bring the greatest number to the top. I had to adapt
search logic to work with multi-dimensioned arrays. I used a quick sort so I
hope it will scale well. I found the search logic at:
http://www.xtremevbtalk.com/showthread.php?t=78889

The code is still a little rough as I haven't finished the rest of the
module and haven't begun to optimize, document or clean it up, but the logic
is working.

'From a click event***********
rst.Open strSQL, con1, 3
intTemp = rst.Fields.Count
intTemp2 = rst.RecordCount
'ReDim arrData(3, intTemp)
ReDim arrTemp(intTemp, intTemp2)
arrTemp = rst.GetRows()
Do While cmbName.ListCount > 0
cmbName.RemoveItem Index:=0
Loop
If Len(Nz(cmbName.Value)) > 0 Then
arrTemp = SearchMultiDmArray(arrTemp, cmbName.Value)
Else
QuickSort arrTemp, 0
End If

For intTemp = (intTemp2 - 1) To 0 Step -1
If UBound(arrTemp, 1) = 2 Then
strTemp = arrTemp(0, intTemp) & ";" & arrTemp(1, intTemp) & ";" &
arrTemp(2, intTemp)
cmbName.AddItem strTemp, Index:=0
ElseIf arrTemp(3, intTemp) > 0 Then
strTemp = arrTemp(0, intTemp) & ";" & arrTemp(1, intTemp) & ";" &
arrTemp(2, intTemp)
cmbName.AddItem strTemp ', Index:=0
End If
Next intTemp
'End of relevant click event code*******

'Receives an array to be searched and the string value containing all the
search terms,
' returns an array sorted by the number of results and alphabetized
Public Function SearchMultiDmArray(ByRef arrTemp(), txtWords As String)
Dim arrWords() As String
Dim arrData() ' As String
Dim iA As Integer
Dim iB As Integer
Dim iC As Integer
Dim iD As Integer
Dim iE As Integer
Dim iG As Integer
Dim sA As String
Dim sB As String
Dim sC As String
Dim sD As String
arrWords() = Split(txtWords, " ")
iA = UBound(arrWords) 'Number of words in search
iB = UBound(arrTemp, 1) 'Number of columns from recordset
iC = UBound(arrTemp, 2) 'Number of rows/records from recordset
ReDim arrData(iB + 1, iC) 'Extend Temp array by one column to store
match rate
For iD = 0 To iC 'For each record in the data array
iE = 0
For iG = 0 To iA
If InStr(arrTemp(0, iD), arrWords(iG)) > 0 Then
iE = iE + 1 'Count number of word matches in eac9h row
End If
Next iG
For iG = 0 To iB 'For each column in data array, copy values to new
array
arrData(iG, iD) = arrTemp(iG, iD)
Next iG
arrData(iB + 1, iD) = iE 'Write number of matches to extended column
Next iD
QuickSort arrData, 0
QuickSort arrData, 3
SearchMultiDmArray = arrData
End Function

Public Sub QuickSort(ByRef arrTemp(), iColSort As Integer)
Dim iLBound, iUBound, iTemp, iOuter, iMax As Integer
Dim iA, iB, iC As Integer
Dim sTemp As String
iOuter = 0
iMax = 0
iLBound = LBound(arrTemp, 2)
iUBound = UBound(arrTemp, 2)
If (iUBound - iLBound) Then
For iOuter = iLBound To iUBound
If arrTemp(iColSort, iOuter) > arrTemp(iColSort, iMax) Then
iMax = iOuter
End If
Next iOuter
iC = UBound(arrTemp, 1)
For iA = 0 To iC 'Move the swap the greatest value to the last element
iB = arrTemp(iA, iMax) 'and all of its columns
arrTemp(iA, iMax) = arrTemp(iA, iUBound)
arrTemp(iA, iUBound) = iB
Next iA
InnerQuickSort arrTemp, iLBound, iUBound, iColSort
End If
End Sub

Private Sub InnerQuickSort(ByRef arrTemp(), ByVal iLeftEnd As Long, _
ByVal iRightEnd As Long, ByVal iColNum As Integer)
Dim iLeftcur, iRightCur, iPivot, iTemp As Long
Dim iA, iB, iC As Integer
Dim sTemp As String
If iLeftEnd >= iRightEnd Then Exit Sub

iLeftcur = iLeftEnd
iRightCur = iRightEnd + 1
iPivot = arrTemp(iColNum, iLeftEnd)
iC = UBound(arrTemp, 1) 'determine number of columns in array
ReDim arrHold(iC)

Do
Do
iLeftcur = iLeftcur + 1
Loop While arrTemp(iColNum, iLeftcur) < iPivot

Do
iRightCur = iRightCur - 1
Loop While arrTemp(iColNum, iRightCur) > iPivot

If iLeftcur >= iRightCur Then Exit Do
For iA = 0 To iC 'Swap the values of all columns
iB = arrTemp(iA, iLeftcur)
arrTemp(iA, iLeftcur) = arrTemp(iA, iRightCur)
arrTemp(iA, iRightCur) = iB
Next iA
Loop
For iA = 0 To iC
arrHold(iA) = arrTemp(iA, iLeftEnd)
arrTemp(iA, iLeftEnd) = arrTemp(iA, iRightCur)
arrTemp(iA, iRightCur) = arrHold(iA)
'arrTemp(iColNum, iLeftEnd) = arrTemp(iColNum, iRightCur)
Next iA

InnerQuickSort arrTemp, iLeftEnd, iRightCur - 1, iColNum
InnerQuickSort arrTemp, iRightCur + 1, iRightEnd, iColNum

End Sub
'***********
raskew via AccessMonster.com said:
You might want to take a look at this article describing the use of multiple
parameters. I've used this and it works well if searching on a number field,
but am still struggling if dealing with a text field. Nonetheless, it's an
interesting concept.
http://www.fabalou.com/Access/Queries/MultipleParameters.asp

Bob

John said:
One method would involve using a temp table that is populated with just
the words you are looking for. You would have to clear the records and
create new records each time you run the query.

TableFindWords with field TheWord
Bob
Jones
Roofing
Company

SELECT YourTable.Company, Count(YourTable.Company) as TimesFound
FROM YourTable INNER JOIN TableFindWords
ON YourTable.Company LIKE "*" & TableFindWords.TheWord & "*"
GROUP BY YourTable.Company
Order By Count(YourTable.Company), YourTable.Company

I'll have to ponder if there are other methods that you could use.

'====================================================
John Spencer
Access MVP 2002-2005, 2007-2008
The Hilltop Institute
University of Maryland Baltimore County
'====================================================
I need to return a list of best/ranked matches from a field in a table. For
example, if the user searches for "Bob Jones Roofing Company", I'd like it to
[quoted text clipped - 8 lines]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top