Unique entries

S

Steven

I have an Access database with over 520,000 rows. I want eliminate those
records which have a duplicate serial number field. (This should reduce the
rows to 100,000 to 200,000.)

How the heck do I do eliminate duplicates?
 
J

Jeanette Cunningham

Hi Steven,
one way which is fairly easy is to create a new table for the data - based
on the current table fields and their properties. Put a unique index on the
serial number field.
Now append all the records to the new table - access will not append records
which have the same serial no more than once. You get only the records with
unique serial nos.


Jeanette Cunningham MS Access MVP -- Melbourne Victoria Australia
 
J

John W. Vinson

I have an Access database with over 520,000 rows. I want eliminate those
records which have a duplicate serial number field. (This should reduce the
rows to 100,000 to 200,000.)

How the heck do I do eliminate duplicates?

The simplest way would be to BACK UP YOUR DATABASE (you may want it back...).
Then create a new table by copying the existing table, *design mode only*, to
a new table name. Open the new table and add a unique Index to the serial
number.

Then run an Append query to append from the existing table into the new one.
You'll get a warning message like "182415 records were not added due to key
violations" - these are the dups.

Then delete the old table and rename the new one with the old table's name.

Do note that if you have records with duplicate serial numbers but DIFFERENT
values in other fields, this (or almost any other) approach will discard
possibly valid data without any warning or any recourse.
 
P

Paul Shapiro

You can start by eliminating the exact duplicates. When you create the new
table, do not yet assign the serial number as the primary key. Do an append
to the new table using a Select Distinct from the existing table to elimate
exact duplicates. Now in the new table you can create a group by query
something like:
Select serialNumber, count(*) as rowCount from NewTable Group By
serialNumber Having count(*) > 1;

That gets a list of all the serial numbers that have more than one row, but
the rows are not exact duplicates. If it's not too many, you can resolve the
differences manually. If it's too many, you could still decide to do
something like the solution below that just arbitrarily takes one of the
rows per serial number, or maybe you can find a way to identify important
differences.

For the future, you should always have a suitable primary key and any
identifiable uniqueness constraints on a table to prevent meaningless
duplicates. It's much easier to prevent the problem than to clean up the
trash.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top