Retain rows based on combination of fields that are distinct

  • Thread starter Thread starter Ann Scharpf via AccessMonster.com
  • Start date Start date
A

Ann Scharpf via AccessMonster.com

I have a 51,000 data table that's been exported from another system. About
11,000 rows would be considered duplicate data and should be deleted. The
table has a total of nine fields. Of those fields, a combination of three
identify a "distinct entry." The other fields contain data that I need to
retain but are irrelevant to identifying duplicates for deletion.

I've tried the SELECT DISTINCT function but you can't include the "extra"
fields and have the DISTINCT work. All the rows become distinct when I
include the other fields. I've looked in the help and at the postings here
but I can't figure out how to use several fields to identify a DISTINCT
record but still keep the other fields in the table.

I'd appreciate any information you can give me. Thanks for your help.

Ann
 
Ann

If only three of the fields define "duplicates", how do you know which
record has the "correct" data for the other 6 fields, that you need to
"retain"?

One way to do this is to add an artificial key to the table (autonumber
field). Then you can do a query to identify those records that have are
"duplicates" based on the three key fields and delete all but one record,
although you are still stuck with the problem identified above.

When I run into cases like this, I generally just write some code that sorts
the entire recordset, then loops through it one record at a time. If the
record is a "duplicate" of the one above it, then I delete it (this method
also has the problem indicated above). If you need more info on either of
these techniques, reply to the group.

Dale
 
Dear Ann:

Consider a simple case of two rows. In the 3 columns you use for
distinctness, they are the same. In the other 6 columns they are completely
different.

It is not clear to me whether you want to delete duplicates, or just report
distinct values of the first 3 columns, along with some values for the other
6.

Here's one way:

SELECT Column1, Column2, Column3,
FIRST(Column4) AS C4,
FIRST(Column5) AS C5,
FIRST(Column6) AS C6,
FIRST(Column7) AS C7,
FIRST(Column8) AS C8,
FIRST(Column9) AS C9
FROM YourTable
GROUP BY Column1, Column2, Column3

Replace ColumnX and YourTable with the actual names of your columns and
table.

When there are more than one row in the table with the same Column1,
Column2, and Column3, it will show only one of them, with values C4-C9 all
from the same row. Which row is arbitrary, and could change over time. I'm
not sure I see the utility of doing it this way, choosing one of several
rows randomly, but it seems to fit what you described.

If the rows are uniquely described by those three columns and perhaps one
more, say, a date/time column, then a query could be built that would return
only that one.

Please let me know if this helped, and if I can be of any other assistance.

Tom Ellison
 
Thanks so much, Tom! This was EXACTLY what I needed. I know it seems odd to
be arbitrary about which data I keep but the exporting system just created
some duplicate records that I need to eliminate. This worked perfectly. I
have printed it out and am keeping it in my Access binder.
 
Back
Top