duplicate address detection add-in for MS Access

P

phleduc

Does anybody know a plugin/addin tool to allow to detect duplicate
people/adresses in an Access database (between mulitple tabels or within the
same table)
The goal is to run a number of records against lists of imported records of
other sources.
The problem is that the fields content are not always comparable as well...
By preference a tool with the VBA available and distributable to third party
databases so it can be edited whereever possible.

Anyone has experience doing this before?

Philip
 
J

Jeff Boyce

Philip

I'm afraid you may be looking for USB - using someone's brain!

Are the following people the same person:
John Smith
John J. Smith
J. J. Smith
J. Smith, Jr.

Are the following addresses the same:
1234 5th St
1234 5th Street
1234 5th Ave
1243 5th St
1234 Fifth St.

Even if you found a plug-in/function that could resolve these, I'm not sure
you could find a function that would handle the entire universe of creative
ways folks can spell names and addresses.

Besides, these newsgroups have repeatedly reported instances of two people
spelling their names exactly the same way ... and in some instances, living
in the same town!

Good luck!

--
Regards

Jeff Boyce
www.InformationFutures.net

Microsoft Office/Access MVP


Microsoft IT Academy Program Mentor
http://microsoftitacademy.com/
 
J

John W. Vinson

Besides, these newsgroups have repeatedly reported instances of two people
spelling their names exactly the same way ... and in some instances, living
in the same town!

Such as my friends Fred Brown and Fred Brown... father and son, living at the
same address (or they were when I started here).

John W. Vinson [MVP]
 
T

Tom Wickerath

Hi Philip,

It's not an add-in, but Access MVP Doug Steele wrote an article for Smart
Access that shows two VBA-based methods of attempting to flag similar names
using the Soundex and Levenshtein Distance algorithms. You can then display
the records found, and let a human deside if they are the same or not.

http://www.accessmvp.com/djsteele/SmartAccess.html

You might be able to apply the same logic for addresses.

See the following download:
April 2005: "Close" only counts in horseshoes...and databases (467 KB)
A couple of techniques to help determine when entries are "close enough" to
be considered the same


Tom Wickerath
Microsoft Access MVP
https://mvp.support.microsoft.com/profile/Tom
http://www.access.qbuilt.com/html/expert_contributors.html
__________________________________________
 
T

Tony Toews [MVP]

Jeff Boyce said:
Besides, these newsgroups have repeatedly reported instances of two people
spelling their names exactly the same way ...

My brother, same last name as me which is rather uncommon, knows of
someone who has his same first, middle and last name. And the same
birth date. Unfortunately this other person has a lengthy criminal
record and thus isn't allowed to enter the U.S.A.

This causes my brother great amounts of trouble at the border when he
wants to cross into the US. So he leaves at least an extra hour in
his schedule.

Well, except for the last time. After being interrogated for ten
minutes someone came into the room and said my brother wasn't the same
person as the other guy so they let him enter.

Tony

--
Tony Toews, Microsoft Access MVP
Please respond only in the newsgroups so that others can
read the entire thread of messages.
Microsoft Access Links, Hints, Tips & Accounting Systems at
http://www.granite.ab.ca/accsmstr.htm
Tony's Microsoft Access Blog - http://msmvps.com/blogs/access/
 
P

Paul Shapiro

You can get name and address standardization software, or you can send your
lists to outside services that will process it for you. Good software for
this is pricey. The cheapest options I saw 5 years ago were about $1500, and
$15,000 was not uncommon. It will not successfully process all your data,
and human review is still necessary for any unclear cases. Results from the
better software should be pretty good, but definitely not perfect. There
will be mistakes. There are web services that offer this functionality now
too. Searching on google for address standardization or address correction
should get you some likely candidates.
 
L

Linq Adams via AccessMonster.com

This kind of thing, for all the reasons already mentioned, always turns into
a mare's nest! A number of years ago, when I was in college, I moonlighted
doing data entry for one of the three largest banks in the country. They'd
bought another bank and needed to get names/addresses from the purchased
bank's IT systems standardized for use by their own systems. There own
humongous IT department reportedly spent six months and $200,000 before
giving up and hiring humans to audit the records and manually change the
entries. The formatting guidelines for the addresses alone were 12 pages long,
single spaced!

Good Luck, my friend, you're going to need it!
 
R

Rick Brandt

Linq said:
This kind of thing, for all the reasons already mentioned, always
turns into a mare's nest! A number of years ago, when I was in
college, I moonlighted doing data entry for one of the three largest
banks in the country. They'd bought another bank and needed to get
names/addresses from the purchased bank's IT systems standardized for
use by their own systems. There own humongous IT department
reportedly spent six months and $200,000 before giving up and hiring
humans to audit the records and manually change the entries. The
formatting guidelines for the addresses alone were 12 pages long,
single spaced!

Good Luck, my friend, you're going to need it!

Our Marketing department recently looked at a package that would use various
forms of "fuzzy logic" to find duplicates in your customer data. It ran around
70,000 dollars I believe. And even then it still required a human to confirm
that any matches found were actually duplicates.
 
G

Guest

Does anybody know a plugin/addin tool to allow to detect duplicate

No, sorry.
Anyone has experience doing this before?

In this country, (AUS) mail is delivered to an 'address point'. To get
the cheapest mail rates, you identify the address point before giving
the mail to the post office. To verify an address, you run it against
a set of all the address points in Australia. This is a big job, and is
normally only done by a mail house.

To verify a particular persons address, you run it against a database
of known peoples addresses - for example, a telephone book, or
a listing of all known medical doctors. This also removes addresses
that it knows to be invalid. You can't normally buy those address lists,
you have to send your data for matching.

When I'm just doing data cleaning, I can't do either of those things,
but those are the only services available, so I have to fall back on
just doing it myself. That has always meant running selection queries
and then eye-balling the data: First you match on SSN (TFN), Telephone,
or medical record number. Then on Post Code and Suburb. Then on
everything else... until you've done this, you wouldn't believe the
incredible variety of true addresses, let alone the variations on a theme.

Out of a thousand address, you're sure to get 10 forwarding addresses,
and one that looks like this:
Ward 3/4, Janefield Training Centre, private bag 1, suburb, zip
Try fitting that into your normalised number/street/suburb table!
Then compare it to the delivery address:
Ward 314, Private road, Janefield Training Centre, suburb, zip2
-- and remember that the zip code for a private bag (which is like
a post office box, but sorted at the mail sorting centre) is different
from the zip code for street delivery. And the private road is
actually a very long driveway.

So I had some sympathy for my bank when they changed the
post code on my PO box address: I understood the kind of data
cleaning exercise which would lead to that mistake. Rather less
when the customer service rep told me that they Never Change
the Address without a direction from the customer. None at all
when they denied all responsibility in writing. Slight rage when
they sent me a letter asking if I was satisfied with their response.
Irritation when they ignored that as well...

(david)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top