deduplicating

  • Thread starter Thread starter Alex
  • Start date Start date
A

Alex

Hi,
I have a list of about 80,000 records and i know that at least a quarter of
them are probably duplicates. I am reasonably crtain that the telephone
number in each record is accurate, so I ran a duplicate query and found some
22000 records ( so assume that about 11,000 records are duplciate).

Does anyone know if there is some code that will allow me to order the
telephone number in numeric order and run through each record deleting
duplciates where it finds them?

Thanks
A
 
Duplicate the table structure. Make the telephone number a unique field.
Append the old data to the new table. You will have a copy of your original
data for backup and a new table with no duplicate telephone numbers. HTH
 
One way to handle this would be to build a new table with a compound unique
index based on the nine fields. Then import all the records into this new
table and ignore the errors. Once the import is successful, delete the old
table and rename the new table to the old table's name.

Another way,
Build a query based on the table that will identify the primary key values
you want to keep and save that as qKeepThese. If you don't care which of
the "duplicate" records you want to keep then you can use the First
aggregate function to more-or-less randomly select one.

SELECT First(PrimaryKeyField) as FirstID
FROM TheTable
GROUP BY TelephoneNumber

DELETE DistinctRow T.*
FROM TheTable as T
WHERE T.PrimaryKeyField IN
(SELECT PrimaryKeyField
FROM TheTable LEFT JOIN QKeepThese
ON TheTable.PrimaryKeyField= QKeepThese.FirstID
WHERE qKeepThese.FirstID is Null)

All in one query would be as follows - only works if field and table names
don't contain "Special" characters.

DELETE DistinctRow T.*
FROM TheTable as T
WHERE T.PrimaryKeyField IN
(SELECT PrimaryKeyField
FROM TheTable LEFT JOIN
(SELECT First(PrimaryKeyField) as FirstID
FROM TheTable
GROUP BY TelephoneNumber) AS QKeepThese
ON TheTable.PrimaryKeyField= QKeepThese.FirstID
WHERE qKeepThese.FirstID is Null)
 
Back
Top