Query - finding dublicates

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

Hello,

I have a table with dublicate records and I need to find a way of deleting
them without having to do it manually.

There are two fields, patient ID and month of supply which sometimes appear
twice and in this case would be identical. Is there a simple way of deleting
the dublicate records?

Many thanks for your help in advance,
 
BACK UP your data before you do this

One way to handle this would be to build a new table with a compound unique
index based on the patientID and Month fields. Then import all the records
into this new table and ignore the errors. Once the import is successful,
delete the old table and rename the new table to the old table's name.

To create a multiple field unique index
--Open up the WorkingTable in design mode
--Select View: Index from the menu
--Enter Name for Index in first blank cell under Index Name
--Select one field in the index under Field Name
--Set Unique to Yes
--Move down one line and select the next FieldName
--Continue moving down and selecting fieldnames until all needed are
included.
--Close the index window and close and save the table

Another way --
Build a query based on the table that will identify the primary key values
you want to keep and save that as qKeepThese. If you don't care which of
the "duplicate" records you want to keep then you can use the First
aggregate function to more-or-less randomly select one.

Query One: << This query is the key to identifying which records to keep>>
SELECT First(PrimaryKeyField) as FirstID
FROM TheTable
GROUP BY PatientID, [Month]

Query Two:
DELETE DistinctRow T.*
FROM TheTable as T
WHERE T.PrimaryKeyField IN
(SELECT PrimaryKeyField
FROM TheTable LEFT JOIN QKeepThese
ON TheTable.PrimaryKeyField= QKeepThese.FirstID
WHERE qKeepThese.FirstID is Null)

All in one query would be as follows - only works if field and table names
don't contain "Special" characters.

DELETE DistinctRow T.*
FROM TheTable as T
WHERE T.PrimaryKeyField IN
(SELECT PrimaryKeyField
FROM TheTable LEFT JOIN
(SELECT First(PrimaryKeyField) as FirstID
FROM TheTable
GROUP BY PatientID, [Month]) AS QKeepThese
ON TheTable.PrimaryKeyField= QKeepThese.FirstID
WHERE qKeepThese.FirstID is Null)

--
John Spencer
Access MVP 2002-2005, 2007
Center for Health Program Development and Management
University of Maryland Baltimore County
..
 
Thank you John - that was very helpful!

I have managed to delete the dublicates succesfully. I used your second
method with two separate queries. I'm trying to develop my Access skills and
it would be helpful to understand what the second query does.

I understand that the first query identifies all those records that I wanted
to keep but now sure how the second one works.

Thanks,

Johanna
--
Johanna G


John Spencer said:
BACK UP your data before you do this

One way to handle this would be to build a new table with a compound unique
index based on the patientID and Month fields. Then import all the records
into this new table and ignore the errors. Once the import is successful,
delete the old table and rename the new table to the old table's name.

To create a multiple field unique index
--Open up the WorkingTable in design mode
--Select View: Index from the menu
--Enter Name for Index in first blank cell under Index Name
--Select one field in the index under Field Name
--Set Uniqonth]) AS QKeepThese
ON TheTable.PrimaryKeyField= QKeepThese.FirstID
WHERE qKeepThese.FirstID is Null)

--
John Spencer
Access MVP 2002-2005, 2007
Center for Health Program Development and Management
University of Maryland Baltimore County
..

Johanna Gronlund said:
Hello,

I have a table with dublicate records and I need to find a way of deleting
them without having to do it manually.

There are two fields, patient ID and month of supply which sometimes
appear
twice and in this case would be identical. Is there a simple way of
deleting
the dublicate records?

Many thanks for your help in advance,
 
SELECT PrimaryKeyField
FROM TheTable LEFT JOIN QKeepThese
ON TheTable.PrimaryKeyField= QKeepThese.FirstID
WHERE qKeepThese.FirstID is Null

This portion returns all primary key fields that were NOT identified in the
first query. It is similar to the query that the UNMATCHED Query Wizard
builds.
The LEFT JOIN says to return all records on the left side of the join (The
Table) and matches for the right side (qKeepThese). If there is no match
return nulls for all the fields in qKeepThese with one copy of the record in
TheTable.

So if there is no match between TheTable and QKeepThese the value of the
FirstID will be null. The WHERE clause then select all records where
FIRSTID is null. So the subquery returns a list of all records that don't
have a matching primary key.

Now the WHERE T.PrimaryKeyField IN clause says
Select all records that have a value in the Primary key field that are in
the results generated by the query inside the parentheses.

Hope that helps you understand how this worked.

--
John Spencer
Access MVP 2002-2005, 2007
Center for Health Program Development and Management
University of Maryland Baltimore County
..

Johanna Gronlund said:
Thank you John - that was very helpful!

I have managed to delete the dublicates succesfully. I used your second
method with two separate queries. I'm trying to develop my Access skills
and
it would be helpful to understand what the second query does.

I understand that the first query identifies all those records that I
wanted
to keep but now sure how the second one works.

Thanks,

Johanna
--
Johanna G


John Spencer said:
BACK UP your data before you do this

One way to handle this would be to build a new table with a compound
unique
index based on the patientID and Month fields. Then import all the
records
into this new table and ignore the errors. Once the import is
successful,
delete the old table and rename the new table to the old table's name.

To create a multiple field unique index
--Open up the WorkingTable in design mode
--Select View: Index from the menu
--Enter Name for Index in first blank cell under Index Name
--Select one field in the index under Field Name
--Set Uniqonth]) AS QKeepThese
ON TheTable.PrimaryKeyField= QKeepThese.FirstID
WHERE qKeepThese.FirstID is Null)

--
John Spencer
Access MVP 2002-2005, 2007
Center for Health Program Development and Management
University of Maryland Baltimore County
..

Johanna Gronlund said:
Hello,

I have a table with dublicate records and I need to find a way of
deleting
them without having to do it manually.

There are two fields, patient ID and month of supply which sometimes
appear
twice and in this case would be identical. Is there a simple way of
deleting
the dublicate records?

Many thanks for your help in advance,
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top