Duplicate Record Mess

G

Guest

I'm working with a 75,000 record database in Access. The problem of Duplicate
records has been going on for months and my task is to eliminate all
duplicates. I wrote a SQL querie that will search the "NAMES" table for
duplicates and write them to a seperate table. Works great ... only 18,000
duplicate records. :-/ Now I need to stop the entry of duplicate records. So
I need a query that will check a temperary table against the "NAMES" table
and stop dupes from being added to "NAMES" table.

I first need to look at FNAME field and compare duplicates against LNAME. If
there are two John Does, I then need to look at the ADD1 field to see if the
John Doe has the same address (if so the records are written to a "DUPE"
table) if address is different, I look at the EMAIL field. If there is a
duplicat John Doe with a duplicate email address, it is written to the "DUPE"
table.

The single table query looks like this:
INSERT INTO [Dupe]
SELECT NAMES.LNAME AS LNAME, NAMES.FNAME AS FNAME, NAMES.ADD1 AS ADD1,
NAMES.EMAIL AS EMAIL, NAMES.ID AS ID, NAMES.ENTRYDATE AS ENTRYDATE
FROM [NAMES]
WHERE ((([NAMES].[LNAME]) In (SELECT [LNAME] FROM [NAMES] As Tmp GROUP BY
[LNAME],[ADD1] HAVING Count(*)>1 And [ADD1] = [NAMES].[ADD1]))) AND
((([NAMES].[FNAME]) In (SELECT [FNAME] FROM [NAMES] As Tmp GROUP BY
[FNAME],[ADD1] HAVING Count(*)>1 And [ADD1] = [NAMES].[ADD1])))
ORDER BY NAMES.LNAME, NAMES.Add1;

How do I write something that compare 2 tables in the same manner?
 
J

Jeff Boyce

You can create a query that joins on all the fields you mentioned, between
two tables. You can then click the SQL view button to view the SQL.

By the way, if John D. and his son John both live at the same address, and
both use the family's email address, you'll be eliminating one of them...

Regards

Jeff Boyce
Microsoft Office/Access MVP
 
G

Guest

Thanks for the quick reply. I'm a newbe at this so I will study "joins" and
work on it.

Ya, I know the solution isn't perfect and there are several scenarios to
consider. That's why I'm writing the dupe records to a table, so we can
manually look at the data to determine actual dupes. When things are slow we
all "do dupes"... we only have abt 18,000 to look at! That's why I'm trying
to nip the problem in the bud.

--
Peace,
Michael


Jeff Boyce said:
You can create a query that joins on all the fields you mentioned, between
two tables. You can then click the SQL view button to view the SQL.

By the way, if John D. and his son John both live at the same address, and
both use the family's email address, you'll be eliminating one of them...

Regards

Jeff Boyce
Microsoft Office/Access MVP


MeloMike said:
I'm working with a 75,000 record database in Access. The problem of
Duplicate
records has been going on for months and my task is to eliminate all
duplicates. I wrote a SQL querie that will search the "NAMES" table for
duplicates and write them to a seperate table. Works great ... only 18,000
duplicate records. :-/ Now I need to stop the entry of duplicate records.
So
I need a query that will check a temperary table against the "NAMES" table
and stop dupes from being added to "NAMES" table.

I first need to look at FNAME field and compare duplicates against LNAME.
If
there are two John Does, I then need to look at the ADD1 field to see if
the
John Doe has the same address (if so the records are written to a "DUPE"
table) if address is different, I look at the EMAIL field. If there is a
duplicat John Doe with a duplicate email address, it is written to the
"DUPE"
table.

The single table query looks like this:
INSERT INTO [Dupe]
SELECT NAMES.LNAME AS LNAME, NAMES.FNAME AS FNAME, NAMES.ADD1 AS ADD1,
NAMES.EMAIL AS EMAIL, NAMES.ID AS ID, NAMES.ENTRYDATE AS ENTRYDATE
FROM [NAMES]
WHERE ((([NAMES].[LNAME]) In (SELECT [LNAME] FROM [NAMES] As Tmp GROUP BY
[LNAME],[ADD1] HAVING Count(*)>1 And [ADD1] = [NAMES].[ADD1]))) AND
((([NAMES].[FNAME]) In (SELECT [FNAME] FROM [NAMES] As Tmp GROUP BY
[FNAME],[ADD1] HAVING Count(*)>1 And [ADD1] = [NAMES].[ADD1])))
ORDER BY NAMES.LNAME, NAMES.Add1;

How do I write something that compare 2 tables in the same manner?
 
G

Guest

Boy, do I feel for you. I just got thru cleaning up a mere 5,000 record
recordset and that was more than enough for me!

Having just finished, another thing you should consider is finding the
duplicates by phone number. You would have to make sure all the phone
numbers are formatted the same way, but if you do your search by phone
number, you will also pick up any duplicates missed due to typo's.

The above cleaning, I found 1200 duplicates by name. I cleaned them all up
then ran a find duplicates query by phone number...I had another 1400
duplicates!!!! More than by name! Even after cleaning up all the names.

Good luck!


MeloMike said:
Thanks for the quick reply. I'm a newbe at this so I will study "joins" and
work on it.

Ya, I know the solution isn't perfect and there are several scenarios to
consider. That's why I'm writing the dupe records to a table, so we can
manually look at the data to determine actual dupes. When things are slow we
all "do dupes"... we only have abt 18,000 to look at! That's why I'm trying
to nip the problem in the bud.

--
Peace,
Michael


Jeff Boyce said:
You can create a query that joins on all the fields you mentioned, between
two tables. You can then click the SQL view button to view the SQL.

By the way, if John D. and his son John both live at the same address, and
both use the family's email address, you'll be eliminating one of them...

Regards

Jeff Boyce
Microsoft Office/Access MVP


MeloMike said:
I'm working with a 75,000 record database in Access. The problem of
Duplicate
records has been going on for months and my task is to eliminate all
duplicates. I wrote a SQL querie that will search the "NAMES" table for
duplicates and write them to a seperate table. Works great ... only 18,000
duplicate records. :-/ Now I need to stop the entry of duplicate records.
So
I need a query that will check a temperary table against the "NAMES" table
and stop dupes from being added to "NAMES" table.

I first need to look at FNAME field and compare duplicates against LNAME.
If
there are two John Does, I then need to look at the ADD1 field to see if
the
John Doe has the same address (if so the records are written to a "DUPE"
table) if address is different, I look at the EMAIL field. If there is a
duplicat John Doe with a duplicate email address, it is written to the
"DUPE"
table.

The single table query looks like this:
INSERT INTO [Dupe]
SELECT NAMES.LNAME AS LNAME, NAMES.FNAME AS FNAME, NAMES.ADD1 AS ADD1,
NAMES.EMAIL AS EMAIL, NAMES.ID AS ID, NAMES.ENTRYDATE AS ENTRYDATE
FROM [NAMES]
WHERE ((([NAMES].[LNAME]) In (SELECT [LNAME] FROM [NAMES] As Tmp GROUP BY
[LNAME],[ADD1] HAVING Count(*)>1 And [ADD1] = [NAMES].[ADD1]))) AND
((([NAMES].[FNAME]) In (SELECT [FNAME] FROM [NAMES] As Tmp GROUP BY
[FNAME],[ADD1] HAVING Count(*)>1 And [ADD1] = [NAMES].[ADD1])))
ORDER BY NAMES.LNAME, NAMES.Add1;

How do I write something that compare 2 tables in the same manner?
 
G

Guest

Thanks Jeff,

Here's the code and the problem now is:

If there is a "Jim Shoe" in the "HuskyTemp" table and a duplicate (2
records) of "Jim Shoe" in the "NAMES" table, the 2 records from the "NAMES"
table will be written to the "DupeHusky" table.
BUT
If there is a "Jim Shoe" record in the "HuskyTemp" table and a duplicate
"Jim Shoe" record in the "NAMES" table it is _not_ written in the "DupeHusky"
table like it should be.

SELECT NAMES.LNAME, NAMES.FNAME, NAMES.ADD1, NAMES.EMAIL, NAMES.ID,
NAMES.ENTRYDATE INTO DupeHusky
FROM [NAMES] INNER JOIN HuskyTemp ON (NAMES.EMAIL = HuskyTemp.Email) AND
(NAMES.ADD1 = HuskyTemp.ADD1) AND (NAMES.LNAME = HuskyTemp.LNAME) AND
(NAMES.FNAME = HuskyTemp.FNAME)
WHERE (((NAMES.LNAME) In (SELECT [LNAME] FROM [NAMES] As Tmp GROUP BY
[LNAME],[ADD1] HAVING Count(*)>1 And [ADD1] = [NAMES].[ADD1])) AND
((NAMES.FNAME) In (SELECT [FNAME] FROM [NAMES] As Tmp GROUP BY [FNAME],[ADD1]
HAVING Count(*)>1 And [ADD1] = [NAMES].[ADD1])))
ORDER BY NAMES.LNAME, NAMES.ADD1;

Thanks for your time and help.
--
Peace,
Michael


Jeff Boyce said:
You can create a query that joins on all the fields you mentioned, between
two tables. You can then click the SQL view button to view the SQL.

By the way, if John D. and his son John both live at the same address, and
both use the family's email address, you'll be eliminating one of them...

Regards

Jeff Boyce
Microsoft Office/Access MVP


MeloMike said:
I'm working with a 75,000 record database in Access. The problem of
Duplicate
records has been going on for months and my task is to eliminate all
duplicates. I wrote a SQL querie that will search the "NAMES" table for
duplicates and write them to a seperate table. Works great ... only 18,000
duplicate records. :-/ Now I need to stop the entry of duplicate records.
So
I need a query that will check a temperary table against the "NAMES" table
and stop dupes from being added to "NAMES" table.

I first need to look at FNAME field and compare duplicates against LNAME.
If
there are two John Does, I then need to look at the ADD1 field to see if
the
John Doe has the same address (if so the records are written to a "DUPE"
table) if address is different, I look at the EMAIL field. If there is a
duplicat John Doe with a duplicate email address, it is written to the
"DUPE"
table.

The single table query looks like this:
INSERT INTO [Dupe]
SELECT NAMES.LNAME AS LNAME, NAMES.FNAME AS FNAME, NAMES.ADD1 AS ADD1,
NAMES.EMAIL AS EMAIL, NAMES.ID AS ID, NAMES.ENTRYDATE AS ENTRYDATE
FROM [NAMES]
WHERE ((([NAMES].[LNAME]) In (SELECT [LNAME] FROM [NAMES] As Tmp GROUP BY
[LNAME],[ADD1] HAVING Count(*)>1 And [ADD1] = [NAMES].[ADD1]))) AND
((([NAMES].[FNAME]) In (SELECT [FNAME] FROM [NAMES] As Tmp GROUP BY
[FNAME],[ADD1] HAVING Count(*)>1 And [ADD1] = [NAMES].[ADD1])))
ORDER BY NAMES.LNAME, NAMES.Add1;

How do I write something that compare 2 tables in the same manner?
 
J

Jeff Boyce

Michael

Let me try to paraphrase...

If you have more than one record in NAMES that match on whatever field(s),
you check another table (?HuskyTemp) to see if there's a copy there. If so,
you write ?all matching records to "DupeHusky".

If you have a match (one record) between NAMES and HuskyTemp, you (still)
want to write to DupeHusky.

There are other potential conditions -- are these not of concern? For
instance, if you have more than one (duplicate), but NO record in HuskyTemp,
what do you do?

Just some rambling thoughts...

* You might not need to write entire records if you have recordIDs -- just
keep the IDs
* Could you use a Find Duplicates query alone, instead of writing records?
* If it doesn't matter how many dupes in NAMES, just that there's a match
in HuskyTemp, why bother checking for dupes in NAMES?

Regards

Jeff Boyce
Microsoft Office/Access MVP


MeloMike said:
Thanks Jeff,

Here's the code and the problem now is:

If there is a "Jim Shoe" in the "HuskyTemp" table and a duplicate (2
records) of "Jim Shoe" in the "NAMES" table, the 2 records from the
"NAMES"
table will be written to the "DupeHusky" table.
BUT
If there is a "Jim Shoe" record in the "HuskyTemp" table and a duplicate
"Jim Shoe" record in the "NAMES" table it is _not_ written in the
"DupeHusky"
table like it should be.

SELECT NAMES.LNAME, NAMES.FNAME, NAMES.ADD1, NAMES.EMAIL, NAMES.ID,
NAMES.ENTRYDATE INTO DupeHusky
FROM [NAMES] INNER JOIN HuskyTemp ON (NAMES.EMAIL = HuskyTemp.Email) AND
(NAMES.ADD1 = HuskyTemp.ADD1) AND (NAMES.LNAME = HuskyTemp.LNAME) AND
(NAMES.FNAME = HuskyTemp.FNAME)
WHERE (((NAMES.LNAME) In (SELECT [LNAME] FROM [NAMES] As Tmp GROUP BY
[LNAME],[ADD1] HAVING Count(*)>1 And [ADD1] = [NAMES].[ADD1])) AND
((NAMES.FNAME) In (SELECT [FNAME] FROM [NAMES] As Tmp GROUP BY
[FNAME],[ADD1]
HAVING Count(*)>1 And [ADD1] = [NAMES].[ADD1])))
ORDER BY NAMES.LNAME, NAMES.ADD1;

Thanks for your time and help.
--
Peace,
Michael


Jeff Boyce said:
You can create a query that joins on all the fields you mentioned,
between
two tables. You can then click the SQL view button to view the SQL.

By the way, if John D. and his son John both live at the same address,
and
both use the family's email address, you'll be eliminating one of them...

Regards

Jeff Boyce
Microsoft Office/Access MVP


MeloMike said:
I'm working with a 75,000 record database in Access. The problem of
Duplicate
records has been going on for months and my task is to eliminate all
duplicates. I wrote a SQL querie that will search the "NAMES" table for
duplicates and write them to a seperate table. Works great ... only
18,000
duplicate records. :-/ Now I need to stop the entry of duplicate
records.
So
I need a query that will check a temperary table against the "NAMES"
table
and stop dupes from being added to "NAMES" table.

I first need to look at FNAME field and compare duplicates against
LNAME.
If
there are two John Does, I then need to look at the ADD1 field to see
if
the
John Doe has the same address (if so the records are written to a
"DUPE"
table) if address is different, I look at the EMAIL field. If there is
a
duplicat John Doe with a duplicate email address, it is written to the
"DUPE"
table.

The single table query looks like this:
INSERT INTO [Dupe]
SELECT NAMES.LNAME AS LNAME, NAMES.FNAME AS FNAME, NAMES.ADD1 AS ADD1,
NAMES.EMAIL AS EMAIL, NAMES.ID AS ID, NAMES.ENTRYDATE AS ENTRYDATE
FROM [NAMES]
WHERE ((([NAMES].[LNAME]) In (SELECT [LNAME] FROM [NAMES] As Tmp GROUP
BY
[LNAME],[ADD1] HAVING Count(*)>1 And [ADD1] = [NAMES].[ADD1]))) AND
((([NAMES].[FNAME]) In (SELECT [FNAME] FROM [NAMES] As Tmp GROUP BY
[FNAME],[ADD1] HAVING Count(*)>1 And [ADD1] = [NAMES].[ADD1])))
ORDER BY NAMES.LNAME, NAMES.Add1;

How do I write something that compare 2 tables in the same manner?
 
G

Guest

Jeff,

The record IDs are assigned when the "HuskyTemp" records are added to the
"NAMES" table.

I started with a Find Duplicates query in "NAMES" to build the "HuskyTemp"
to "NAMES" duplicate query. We currently have 18,000 dupes in "NAMES" which
we are addressing, I'm trying to stop the entry of dupes first. Then we can
weed out old dupes in "NAMES" as time goes on.

If the boss gets tired of all the existing dupes in "NAMES" I will add to
the query a statement to delete the dupes with the highest ID which will
preserve the first enterd dupe but she doesn't want to use a "search and
destroy" technique yet.
--
Peace,
Michael


Jeff Boyce said:
Michael

Let me try to paraphrase...

If you have more than one record in NAMES that match on whatever field(s),
you check another table (?HuskyTemp) to see if there's a copy there. If so,
you write ?all matching records to "DupeHusky".

If you have a match (one record) between NAMES and HuskyTemp, you (still)
want to write to DupeHusky.

There are other potential conditions -- are these not of concern? For
instance, if you have more than one (duplicate), but NO record in HuskyTemp,
what do you do?

Just some rambling thoughts...

* You might not need to write entire records if you have recordIDs -- just
keep the IDs
* Could you use a Find Duplicates query alone, instead of writing records?
* If it doesn't matter how many dupes in NAMES, just that there's a match
in HuskyTemp, why bother checking for dupes in NAMES?

Regards

Jeff Boyce
Microsoft Office/Access MVP


MeloMike said:
Thanks Jeff,

Here's the code and the problem now is:

If there is a "Jim Shoe" in the "HuskyTemp" table and a duplicate (2
records) of "Jim Shoe" in the "NAMES" table, the 2 records from the
"NAMES"
table will be written to the "DupeHusky" table.
BUT
If there is a "Jim Shoe" record in the "HuskyTemp" table and a duplicate
"Jim Shoe" record in the "NAMES" table it is _not_ written in the
"DupeHusky"
table like it should be.

SELECT NAMES.LNAME, NAMES.FNAME, NAMES.ADD1, NAMES.EMAIL, NAMES.ID,
NAMES.ENTRYDATE INTO DupeHusky
FROM [NAMES] INNER JOIN HuskyTemp ON (NAMES.EMAIL = HuskyTemp.Email) AND
(NAMES.ADD1 = HuskyTemp.ADD1) AND (NAMES.LNAME = HuskyTemp.LNAME) AND
(NAMES.FNAME = HuskyTemp.FNAME)
WHERE (((NAMES.LNAME) In (SELECT [LNAME] FROM [NAMES] As Tmp GROUP BY
[LNAME],[ADD1] HAVING Count(*)>1 And [ADD1] = [NAMES].[ADD1])) AND
((NAMES.FNAME) In (SELECT [FNAME] FROM [NAMES] As Tmp GROUP BY
[FNAME],[ADD1]
HAVING Count(*)>1 And [ADD1] = [NAMES].[ADD1])))
ORDER BY NAMES.LNAME, NAMES.ADD1;

Thanks for your time and help.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top