Matching slightly dissimilar fields

S

slakjaw

Greetings to all, this is my first post and (wouldn't you know it) I'v
got a problem!
I'm attempting to match two slightly dissimilar text fields.

I have 2 similar tables. Each contains records of companies, wit
fields such as COMPANY_NAME, PRIMARY_CONTACT, ADDRESS, PHONE...etc. On
table (Table B) has about 2,700 records and is much smaller than th
other table (Table A), which has 93,000 records. The smaller table WA
NOT created from the larger table and they do not share common uniqu
identifiers. Both tables are company listings, so their respectiv
fields contain similar kinds of data, though the field names of the tw
tables are not identical. The tables come from different source
entirely and just happen to both be listings of companies. The table
are not well designed to be properly relational relative to othe
tables within a normalized database. Most of the fields of both table
are irrelevant to my purpose, which is matching each the compan
records in Table B to that company’s record in Table A

Notes:
Table A contains numerous COMPANY_NAME field duplicates, because eac
company facility location is a record (not normalized)
Table A has two text fields (that I’m interested in) that Table
does not have (field x and field y).
I have determined that the tables have fewer than 500 exac
COMPANY_NAME field matches, but I know that nearly all of the 2,70
companies listed in table B are also companies that are listed in tabl
A. A major part of the problem seems to be that there are many ways t
spell company names, so there are few exact matches. There are eve
more ways to represent or misrepresent addresses, i.e. with or withou
suite numbers, numerals spelled out or not, Av./Av/Ave/Ave.
bl./bl/blvd./blvd, Saint or St. …the permutations go on and on…
I am using Access 2002, but also have Excel 2002 available

My problem is as follows:
I want to create a new table (Table C).
Table C would contain all table B company records that correspond t
table A company records and I need the x and y fields (containing data
from table A in the new table C.

I would be grateful if someone could suggest an approach.

Thank
 
J

John Nurick

There's no easy way of doing this. What I'd do is:

1) Make sure that Table A has a unique identifier field. If necessary
add one. Let's call this KeyA.
2) Add an new, empty field to TableB of a compatible data type. Let's
call it KeyB.

3) Create an update query that joins the tables on fields that will pick
up the exact matches (e.g. COMPANY_NAME and POSTAL_CODE) and updates
KeyB to the value of KeyA in the corresponding record.

The situation now is that records in B that have been matched have
values in KeyB. The ones yet to be matched - the ones you're still
interested in - have NULL in KeyB.

Conversely, the records in A that you're still interested in are the
ones whose KeyA value cannot be found in any KeyB.

4) So you can now create two queries to return these. The SQL view will
be something like:

qryB: SELECT * FROM TableB WHERE KeyB IS NULL;

qryA: SELECT * FROM TableA WHERE KeyA NOT IN
(SELECT KeyB FROM TableB WHERE KeyB IS NOT NULL);

5) The next stage is to create a series of queries that join qryA and
qryB on an expression that will give you a more or less good match.

For instance, if some of the records contain company registration
numbers you could try something like
WHERE (qryA.REG_NO = qryB.COMPANY_NUMBER)
AND (qryA.STATE = qryB.STATE)
which should give an exact match. Or this
WHERE (Left(qryA.COMPANY_NAME,5) = Left(qryB.COMPANY_NAME,5))
AND (Left(qryA.POSTAL_CODE,5) = Left(qryB.POSTAL_CODE,5)

Set up each of these queries so it displays enough fields from each
table for you to decide whether each match made by the query is a real
one. If they all are, you can use an update query with the same WHERE
condition to update the KeyB value in the corresponding records in
TableB; otherwise, either refine the WHERE condition or add the KeyB
values manually.

Eventually you'll find yourself with a small enough number of records
left in qryB that you can just look for them "by eye" in qryA.
 
C

Chris Wagner

I used to do this type of thing a few years back with addresses. I would run
both tables through a mailing software program and apply the zip5, zip4, and
the delivery point code (a 2 digit code, 3 with the correction code)
comparing the two twelve digit numbers and 80 percent of the company name (
sometimes using a soundex code). run a report on the matches where the names
do not match and see what the problem might be.

Mailers at Melissadata.com has a free 45 day trial to see if it works
although they are kind of expensive for purchase.

There is another one that runs about $78 although I have not seen it run,
adn I cant remember the name either. Tried looking on the web under 'bulk
mail software' presort. There are a lot out there

http://www.hardin-soft.com/ was one I saw for cheap.

Just a thought
Chris Wagner



John said:
There's no easy way of doing this. What I'd do is:

1) Make sure that Table A has a unique identifier field. If necessary
add one. Let's call this KeyA.
2) Add an new, empty field to TableB of a compatible data type. Let's
call it KeyB.

3) Create an update query that joins the tables on fields that will pick
up the exact matches (e.g. COMPANY_NAME and POSTAL_CODE) and updates
KeyB to the value of KeyA in the corresponding record.

The situation now is that records in B that have been matched have
values in KeyB. The ones yet to be matched - the ones you're still
interested in - have NULL in KeyB.

Conversely, the records in A that you're still interested in are the
ones whose KeyA value cannot be found in any KeyB.

4) So you can now create two queries to return these. The SQL view will
be something like:

qryB: SELECT * FROM TableB WHERE KeyB IS NULL;

qryA: SELECT * FROM TableA WHERE KeyA NOT IN
(SELECT KeyB FROM TableB WHERE KeyB IS NOT NULL);

5) The next stage is to create a series of queries that join qryA and
qryB on an expression that will give you a more or less good match.

For instance, if some of the records contain company registration
numbers you could try something like
WHERE (qryA.REG_NO = qryB.COMPANY_NUMBER)
AND (qryA.STATE = qryB.STATE)
which should give an exact match. Or this
WHERE (Left(qryA.COMPANY_NAME,5) = Left(qryB.COMPANY_NAME,5))
AND (Left(qryA.POSTAL_CODE,5) = Left(qryB.POSTAL_CODE,5)

Set up each of these queries so it displays enough fields from each
table for you to decide whether each match made by the query is a real
one. If they all are, you can use an update query with the same WHERE
condition to update the KeyB value in the corresponding records in
TableB; otherwise, either refine the WHERE condition or add the KeyB
values manually.

Eventually you'll find yourself with a small enough number of records
left in qryB that you can just look for them "by eye" in qryA.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top