Like operator and wildcards, compare fields

T

titlepusher

I have 2 tables. Both contain a Name field. I need to check [Name1] against
[Name2] and variants of [Name2]. example: if [Name1] is "Old SPG Land
Co." and [Name2] is "SPG Land associates" OR "Old SPG Real Estate" OR
"LandCo1 & SPG Trust" I want a query to match [Name1] and [Name2] even
though they are not perfect.. or really even close.. just a match on a
"common pattern contained, or closeness or similarity or dirivitive of".
My purpose is to check all new business names we see for upcoming
transactions against a list of known fraudulant names. The scammers seem to
use variations of a few company names over and over.. Thank you, tp
 
J

John W. Vinson

I have 2 tables. Both contain a Name field. I need to check [Name1] against
[Name2] and variants of [Name2]. example: if [Name1] is "Old SPG Land
Co." and [Name2] is "SPG Land associates" OR "Old SPG Real Estate" OR
"LandCo1 & SPG Trust" I want a query to match [Name1] and [Name2] even
though they are not perfect.. or really even close.. just a match on a
"common pattern contained, or closeness or similarity or dirivitive of".
My purpose is to check all new business names we see for upcoming
transactions against a list of known fraudulant names. The scammers seem to
use variations of a few company names over and over.. Thank you, tp

This is a very, very difficult task in general. The only common feature I see
here is the common occurance of "SPG" - but that is no more saliant to a
computer than the common occurance of "Old" or "The" or even " ".

Of course if one name is "SPG Land Associates" and the other is "Swampy Peat
Grasslands Associates" the similarity is even less clear.

I could imagine a VBA routine parsing each text string into words, filtering
out stopwords that would be just too common ("the", "inc" etc.) and testing
matches, but it would still require extensive human review to find false hits
and misses.
 
F

Fernando Loizides

Hi,

you could always try an edit distance algorithm. "LevenShtein distance"
should be good for you. This measures the minimum number of operations
(subtract, replace and add) needeed to transform one String to another. The
higher the number the least similarity between the two strings.

Fernando Loizides
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top