Get rid of "Duplicate Records" in a table

A

ali

I have a table: (Dummy Scenario)

-Auto_number
-Employee_ID
-Name
-Join_Date
-Employee_Type
-Position

--------------------------------------------------------------------------------------
Problem:

Due to some importation reasons.... some entries are duplicated. They have
exactly same data but they occupy more than 1 row, maybe 2 or three

--------------------------------------------------------------------------------------
I want to:

1) Retrieve those duplicated entries.
2) Make all entries in my table unique by getting rid off them.


Dear experts, thanks a lot !
 
A

Allen Browne

Use a subquery to see if there is a "duplicate". This kind of thing:

DELETE FROM Table1
WHERE [Auto_number] >
(SELECT Min([Auto_number] AS MinID
FROM Table1 AS Dupe
WHERE Dupe.[Employee_ID] = Table1.[Employee_ID]
AND Dupe.[Name] = Table1.[Name]
AND ...
AND Dupe.[Postion] = Table1.[Position]);

The core idea here is to retain the lowest Auto_number value. If there are
other Auto_number values that are higher than the lowest one and matches on
all the fields that define "duplicate", then dump those records. The WHERE
clause of the subquery will need to contain all the fields that define
"duplicate" for you.

If subqueries are new, here's an introduction:
http://allenbrowne.com/subquery-01.html

You stated that this is a dummy scenario, but I'll just mention that you
could run into problems using Name and Position as field names. Here's a
list of the names to avoid:
http://allenbrowne.com/AppIssueBadWord.html
 
A

ali

Great method !, it works ! and big thanks for the tips !

However,

I have another table without "primary keys" (i know it goes against DB
design therory!, but it just happens !),

If i do not have "primary key", how can i get rid of duplicate data ?



Thanks a lot allen !


--
Allen Phailat Wongakanit


Allen Browne said:
Use a subquery to see if there is a "duplicate". This kind of thing:

DELETE FROM Table1
WHERE [Auto_number] >
(SELECT Min([Auto_number] AS MinID
FROM Table1 AS Dupe
WHERE Dupe.[Employee_ID] = Table1.[Employee_ID]
AND Dupe.[Name] = Table1.[Name]
AND ...
AND Dupe.[Postion] = Table1.[Position]);

The core idea here is to retain the lowest Auto_number value. If there are
other Auto_number values that are higher than the lowest one and matches on
all the fields that define "duplicate", then dump those records. The WHERE
clause of the subquery will need to contain all the fields that define
"duplicate" for you.

If subqueries are new, here's an introduction:
http://allenbrowne.com/subquery-01.html

You stated that this is a dummy scenario, but I'll just mention that you
could run into problems using Name and Position as field names. Here's a
list of the names to avoid:
http://allenbrowne.com/AppIssueBadWord.html

--
Allen Browne - Microsoft MVP. Perth, Western Australia

Reply to group, rather than allenbrowne at mvps dot org.

ali said:
I have a table: (Dummy Scenario)

-Auto_number
-Employee_ID
-Name
-Join_Date
-Employee_Type
-Position

--------------------------------------------------------------------------------------
Problem:

Due to some importation reasons.... some entries are duplicated. They have
exactly same data but they occupy more than 1 row, maybe 2 or three

--------------------------------------------------------------------------------------
I want to:

1) Retrieve those duplicated entries.
2) Make all entries in my table unique by getting rid off them.


Dear experts, thanks a lot !
 
A

Allen Browne

Add a primary key

If there is no way to distingish between duplicates, there is no way to
instruct Access which one to delete.

Add a primary key.

If you don't want to do that, create a query that deduplicates the data
(GROUP BY), and turn that into a Make Table query so you end up with a table
that is de-duplicated.

--
Allen Browne - Microsoft MVP. Perth, Western Australia

Reply to group, rather than allenbrowne at mvps dot org.

ali said:
Great method !, it works ! and big thanks for the tips !

However,

I have another table without "primary keys" (i know it goes against DB
design therory!, but it just happens !),

If i do not have "primary key", how can i get rid of duplicate data ?



Thanks a lot allen !


--
Allen Phailat Wongakanit


Allen Browne said:
Use a subquery to see if there is a "duplicate". This kind of thing:

DELETE FROM Table1
WHERE [Auto_number] >
(SELECT Min([Auto_number] AS MinID
FROM Table1 AS Dupe
WHERE Dupe.[Employee_ID] = Table1.[Employee_ID]
AND Dupe.[Name] = Table1.[Name]
AND ...
AND Dupe.[Postion] = Table1.[Position]);

The core idea here is to retain the lowest Auto_number value. If there
are
other Auto_number values that are higher than the lowest one and matches
on
all the fields that define "duplicate", then dump those records. The
WHERE
clause of the subquery will need to contain all the fields that define
"duplicate" for you.

If subqueries are new, here's an introduction:
http://allenbrowne.com/subquery-01.html

You stated that this is a dummy scenario, but I'll just mention that you
could run into problems using Name and Position as field names. Here's a
list of the names to avoid:
http://allenbrowne.com/AppIssueBadWord.html

ali said:
I have a table: (Dummy Scenario)

-Auto_number
-Employee_ID
-Name
-Join_Date
-Employee_Type
-Position

--------------------------------------------------------------------------------------
Problem:

Due to some importation reasons.... some entries are duplicated. They
have
exactly same data but they occupy more than 1 row, maybe 2 or three

--------------------------------------------------------------------------------------
I want to:

1) Retrieve those duplicated entries.
2) Make all entries in my table unique by getting rid off them.


Dear experts, thanks a lot !
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top