Fuzzy Match on two tables

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

Hi all, (Windows 2000, Access 2000/2002)
I have a situation that keeps coming up. I keep coming across data that is
in Excel, Word, or some other file with information that we have standards
for; namely Street Names for the whole city. Everything that I have set-up
has a drop-down connected to the "Street Name" table. Everything that I
receive is never correctly formatted.
I've looked over Douglas Steele's article: (not thoroughly)
http://www.accessmvp.com/djsteele/SmartAccess.html
April 2005: "Close" only counts in horseshoes...and databases

Question: Is there a way to compare data from two tables and update from the
tblStreet_Name.fldSt_Name to the tblDestination.fldSt_Name?

example:
W Third St (correct)
W. Third St. (incorrect)
W Third (incorrect)
 
Question: Is there a way to compare data from two tables and update from the
tblStreet_Name.fldSt_Name to the tblDestination.fldSt_Name?

Maybe a statement similar to: replace( tblDestination.fldSt_Name ) where (
tblDestination.fldSt_Name is like tblStreet_Name.fldSt_Name )

Unfortunately, my logic is to fuzzy and I can't seem to think of how
something like this would accomplished.

Thanks in advance,
NickX
 
Insert into Table1
Select * from Table2
Where Not(Exists(Select Table2.Key from table2 where
(Table1.Key=Table2.Key)))


If the tables are not setup the same, you will have to include the fields as
well.
 
Nick X said:
Maybe a statement similar to: replace( tblDestination.fldSt_Name ) where (
tblDestination.fldSt_Name is like tblStreet_Name.fldSt_Name )

Unfortunately, my logic is to fuzzy and I can't seem to think of how
something like this would accomplished.


It is essential to define what you mean by "like". A human
may be able to look at two addresses and say they are
variations of the same thing, but a computer must have exact
rules for determining if two things are "like" each other.

One way to approach this is the brute force method of using
another table that has a record for each address variation
that you have tripped over in the past along with your
standardized address.

Another approach that may or may not be useful to you is the
SoundX technique. Search the web and the Google newsgroup
archives for discussions about SoundX to see if it might
help.
 
Darren,
Thank you for your quick response.
If the tables are not setup the same, you will have to include the fields as
well.
Unfortunately, my main problem is that the data I am trying to edit comes
from Word documents and Excel spreadsheets, so there are no "key" fields that
are easily referenced.
Insert into Table1
Select * from Table2
Where Not(Exists(Select Table2.Key from table2 where
(Table1.Key=Table2.Key)))

Is there a way to make this a "like" statement (Table1.Key=Table2.Key)
that would insert wild cards

Table1.St_Name is like Table2.St_Name
W. Third St. is like W Third St
(incorrect) (correct)
change the first to the latter
These are not addresses, just raw street names. I will have to deal with
this until I can get everyone on the same page. I know that I will have to
do some manual editing, but if I could even get a small percentage changed
that would be great.

Thanks again,
NickX
 
Marshall,
Thank you for your quick response.
If you remeber we went 'round and 'round on "Need to simplify Switch()"
but I digress...

My main problem is that the data I am trying to edit comes
from Word documents and Excel spreadsheets. Everytime I turn around there
seems to be someone else wanting me to create a database based on a Word
document. said:
It is essential to define what you mean by "like". A human
may be able to look at two addresses and say they are
variations of the same thing, but a computer must have exact
rules for determining if two things are "like" each other.

These are not addresses, just raw street names. I will have to deal with
this until I can get everyone on the same page. I know that I will have to
do some manual editing, but if I could even get a small percentage changed
that would be great.

Table1.St_Name is like Table2.St_Name
W. Third St. is like W Third St
(incorrect) (correct)
change the first to the latter

What if we take the correct "W Third St" from Table2.St_Name, tell a query
to look at it as "W*THIRD*" so it finds the incorrect "W. Third St." or "W
Third" or "W. Third" in Table1.St_Name and replaces it with the correct "W
Third St" from Table2.St_Name. It all sounds so simple:)
Another approach that may or may not be useful to you is the
SoundX technique.

I've looked over Douglas Steele's article: (not thoroughly)
http://www.accessmvp.com/djsteele/SmartAccess.html
April 2005: "Close" only counts in horseshoes...and databases
I don't know how to bring the two tables(fields) together. I will read this
more thoroughly this weekend.

Well, I'm starting to ramble. Thanks much!
Nick X
 
Hi Nick,

There's no simple way of doing what you want, except to buy specialist
commercial software.

You might start by using the Replace() function to strip out any periods
and other punctuation. You could then regularise Street and Road and
Avenue and Av and so on into your preferred forms.

Likewise, you could (e.g.) a "West" at the beginning to a "W", though
while changing (say) "West 25th St" to "W 25th St" you must take care
not to "West Street" to "W Street".

Similarly you need to avoid changing "Avenue Road" to "Ave Rd" while
changing "Fifth Avenue" to "Fifth Ave".

Unless you have a small neat city, the best approach may be to parse the
string into its elements. If you search the newsgroups (eg at Google
Groups) for
nurick horse hockey
you'll find a VBA function that can parse most North American street
addresses; you may be able to modify it for your needs.

Doug's article and sample database are relevant too. If you find
yourself needing to match names through typing mitsakes and mispelings
(e.g. "Thrid St" with "Third St"). study what Doug says about the
Levenshtein distance.
 
Nick X said:
Darren,
Thank you for your quick response.

Unfortunately, my main problem is that the data I am trying to edit comes
from Word documents and Excel spreadsheets, so there are no "key" fields that
are easily referenced.


Is there a way to make this a "like" statement (Table1.Key=Table2.Key)
that would insert wild cards

Table1.St_Name is like Table2.St_Name
W. Third St. is like W Third St
(incorrect) (correct)
change the first to the latter
These are not addresses, just raw street names. I will have to deal with
this until I can get everyone on the same page. I know that I will have to
do some manual editing, but if I could even get a small percentage changed
that would be great.


A small percentage is certainly possible, but "small" is
rather subjective, isn't it ;-)

The way I have approached this kind of thing is to create a
table (maybe your table 2??) that has a wildcard pattern for
various variations of a street name in one column and a
second column that has the correct street name. For
example:

Table Streets:
Pattern CorrectedStreet
W* Third St* W Third St
W* 3rd St* W Third St
N* Penn* Ave* N Pensylvania Ave

Then you can use a query to pick out the "correct" name:

SELECT tblData.Street, tblData,fielda, . . .,
Streets.CorrectedStreet
FROM tblData LEFT JOIN Streets
ON tblData.Street Like Streets.Pattern

If a funky street name does not match a pattern in the table
the CorrectedStreet field in the query will be Null. You
can filter the query for this situation to provide a list of
streets that need to be added to the Streets table.

If more than one pattern matches a funky steet, you will get
two records in the query so you can see what the
alternatives are and refine the matching patterns.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top