Creating a Query that Deletes Duplicates by Date

M

mitchpark

Alright, I am trying to create a query that will sort by First Name,
Last Name, Perm Address, and date updated that will find duplicates
between FNAME, LNAME, PADD, and will use the date updated column to
delete the oldest duplicates and leave the most recent. As I have a
database of roughly ~7500 names I need a way to sort out duplicates
that were added on different dates and leave the most recent.
Therefore, is there any way to create such a query (in 2007) and if so
could someone explain it on a very basic level as my skills with
access are fairly weak and I have only been using it for about two
weeks. At this point I really only know how to use the query wizard
from the table view. Thanks for any help.

~Mitch
 
A

Allen Browne

To achieve this, you will need:
- a primary key field in your table (e.g. an autonumber),
- a date field that indicates when the records were entered,
- a willingness to learn about subqueries.

This example assumes the primary key is named ID, the date field is named
EntryDate, the table is called Table1, and there are always values in FNAME,
LNAME, and PADD.

DELETE FROM Table1 WHERE ID <>
(SELECT TOP 1 ID FROM Table1 AS Dupe
WHERE Dupe.FNAME = Table1.FNAME
AND Dupe.LNAME = Table1.LNAME
AND Dupe.PADD = Table1.PADD
ORDER BY Dupe.EntryDate DESC, ID DESC)

If subqueries are a new concept, see:
http://allenbrowne.com/subquery-01.html
 
M

mitchpark

To achieve this, you will need:
- a primary key field in your table (e.g. an autonumber),
- a date field that indicates when the records were entered,
- a willingness to learn about subqueries.

This example assumes the primary key is named ID, the date field is named
EntryDate, the table is called Table1, and there are always values in FNAME,
LNAME, and PADD.

DELETE FROM Table1 WHERE ID <>
(SELECT TOP 1 ID FROM Table1 AS Dupe
WHERE Dupe.FNAME = Table1.FNAME
AND Dupe.LNAME = Table1.LNAME
AND Dupe.PADD = Table1.PADD
ORDER BY Dupe.EntryDate DESC, ID DESC)

If subqueries are a new concept, see:
   http://allenbrowne.com/subquery-01.html

First off, thank you for the quick response Allen I guess im still a
little lost on this just because of my inexperience. The table name is
"Master Database" the fields for the duplicate search are "LNAME"
"FNAME" and "PADD" with "Date Updated" as the criteria for which ones
get deleted. I guess I don't understand where I enter the commands
that form the query. Do I bring up the Query wizard, and run one for
duplicates based on FNAME LNAME and PADD then switch to design view
and add the subquery somewhere there? Or do I use query design in
which case I could use a step by step where to click and input data
into which fields (even though I understand its a huge pain).
Subqueries are new to me and the tutorial you linked just didn't help
this process click for me. If you could help me out even more i would
really appreciate it.

~Mitch
 
M

mitchpark

First off, thank you for the quick response Allen I guess im still a
little lost on this just because of my inexperience. The table name is
"Master Database" the fields for the duplicate search are "LNAME"
"FNAME" and "PADD" with "Date Updated" as the criteria for which ones
get deleted. I guess I don't understand where I enter the commands
that form the query. Do I bring up the Query wizard, and run one for
duplicates based on FNAME LNAME and PADD then switch to design view
and add the subquery somewhere there? Or do I use query design in
which case I could use a step by step where to click and input data
into which fields (even though I understand its a huge pain).
Subqueries are new to me and the tutorial you linked just didn't help
this process click for me. If you could help me out even more i would
really appreciate it.

~Mitch

Also there is an ID field that is numerical.
 
L

Lord Kelvan

DELETE FROM [Master Database] WHERE ID <>
(SELECT TOP 1 ID FROM [Master Database] AS Dupe
WHERE Dupe.FNAME = Table1.FNAME
AND Dupe.LNAME = Table1.LNAME
AND Dupe.PADD = Table1.PADD
ORDER BY Dupe.[Date Updated] DESC, ID DESC)

to use the query
create a new query
close the popup box
click view in the menu bar
click sql view
paste the above into the window replacing what is there
run the query

hope this helps

Regards
Kelvan
 
M

mitchpark

DELETE FROM [Master Database] WHERE ID <>
(SELECT TOP 1 ID FROM [Master Database] AS Dupe
WHERE Dupe.FNAME = Table1.FNAME
AND Dupe.LNAME = Table1.LNAME
AND Dupe.PADD = Table1.PADD
ORDER BY Dupe.[Date Updated] DESC, ID DESC)

to use the query
create a new query
close the popup box
click view in the menu bar
click sql view
paste the above into the window replacing what is there
run the query

hope this helps

Regards
Kelvan

Thanks Kelven thats a great help, but what do I put in for LNAME,
FNAME, and PADD when it asks for parameter?
 
A

Allen Browne

If it asks for a parameter, it means that Access cannot find fields by that
name.

Make sure the name is correct. For example, is it F Name (with a space)? If
so, be sure to put square brackets around the name, i.e. [F Name]

--
Allen Browne - Microsoft MVP. Perth, Western Australia

Reply to group, rather than allenbrowne at mvps dot org.

DELETE FROM [Master Database] WHERE ID <>
(SELECT TOP 1 ID FROM [Master Database] AS Dupe
WHERE Dupe.FNAME = Table1.FNAME
AND Dupe.LNAME = Table1.LNAME
AND Dupe.PADD = Table1.PADD
ORDER BY Dupe.[Date Updated] DESC, ID DESC)

to use the query
create a new query
close the popup box
click view in the menu bar
click sql view
paste the above into the window replacing what is there
run the query

hope this helps

Regards
Kelvan

Thanks Kelven thats a great help, but what do I put in for LNAME,
FNAME, and PADD when it asks for parameter?
 
L

Lord Kelvan

no i didnt modify the query properly

DELETE FROM [Master Database] WHERE ID <>
(SELECT TOP 1 ID FROM [Master Database] AS Dupe
WHERE Dupe.FNAME = [Master Database].FNAME
AND Dupe.LNAME = [Master Database].LNAME
AND Dupe.PADD = [Master Database].PADD
ORDER BY Dupe.[Date Updated] DESC, ID DESC)

there that should work

Reagrds
Kelvan
 
M

mitchpark

no i didnt modify the query properly

DELETE FROM [Master Database] WHERE ID <>
(SELECT TOP 1 ID FROM [Master Database] AS Dupe
WHERE Dupe.FNAME = [Master Database].FNAME
AND Dupe.LNAME = [Master Database].LNAME
AND Dupe.PADD = [Master Database].PADD
ORDER BY Dupe.[Date Updated] DESC, ID DESC)

there that should work

Reagrds
Kelvan

Just got a chance to test this out and it worked perfectly, thank you
for all your help it is greatly appreciated.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top