Removing duplicate data while keeping one of every set

  • Thread starter Thread starter Nate
  • Start date Start date
N

Nate

I have a client list of 60,000 with aprox. 9,000 duplicated names and
adresses. I need to remove the sets of duplicated data, while keeping one of
each set. Any help would be great.

Thank you
 
Firstly, if you don't have one already, add an autonumber column to the table
to uniquely identify each row. Then use a query along these lines:

DELETE *
FROM Contacts AS C1
WHERE ContactID <>
(SELECT MIN(ContactID)
FROM Contacts AS C2
WHERE C2.LastName = C1.LastName
AND C2.FirstName = C1.FirstName
AND C2.AddressLine1 = C1.AddressLine1);

where ContactID is the autonumber column.

The above example assumes that duplication is identified by rows having the
same values in the LastName, FirstName and AddressLine1 columns, but you'll
be able to amend it easily if the basis of duplication is some other set of
columns. It also assumes of course that the values in these columns are
*exactly* the same.

AS always with this sort of operation its imperative that the table be
backed-up first of course.

Ken Sheridan
Stafford, England
 
Nate said:
I have a client list of 60,000 with aprox. 9,000
duplicated names and adresses. I need to remove
the sets of duplicated data, while keeping one of
each set. Any help would be great.

A problem sometimes encountered in similar situations is where you do not
have actual "duplicated" records, but multiple records for the same "entity"
(in your case, client) but with different data about the entity -- then it
is not so easy to choose which one to keep.

Larry Linson
Microsoft Office Access MVP
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top