Deleting duplicates entries in Database using MS Access

narasimharaosama · Feb 4, 2007

Hii,

I have a problem in deleting duplicates from the tables in MS Access.I
have a table that contains about 40,000 rows and about 15 columns or
fields.Out of 15 I want to delete the entries that match on the 8
fields.That is not all the fields in the table are perfectly match.I
want to delete only on those specific field matches.I hope someone can
come up with a solution for me.

Thanks

Narasimha

Smartin · Feb 4, 2007

Hii,

I have a problem in deleting duplicates from the tables in MS Access.I
have a table that contains about 40,000 rows and about 15 columns or
fields.Out of 15 I want to delete the entries that match on the 8
fields.That is not all the fields in the table are perfectly match.I
want to delete only on those specific field matches.I hope someone can
come up with a solution for me.

Thanks

Narasimha

Namaste Narasimha,

I calculate 6435 combinations are possible taking 8 fields from a
population of 15 fields. You may need the assistance of a professional.

Is there any way you can narrow down the target?

John Spencer · Feb 4, 2007

Do you mean you want to identify duplicates based on 8 specific fields
of the 15 or do you mean you want to identify duplicates if any 8 of the
15 fields match?

If you mean a specific 8 fields, then you have a couple of possible
solutions.

One way to handle this would be to build a new table with a compound
unique index based on the eight fields. Then import all the records
into this new table and ignore the errors. Once the import is
successful, delete the old table and rename the new table to the old
table's name.

Another way, (BACK UP your data before you do this)
Assuming that your table has a single-field primary key would be to
build a query based on the table that will identify the primary key
values you want to keep and save that as qKeepThese. If you don't care
which of the "duplicate" records you want to keep then you can use the
First aggregate function to more-or-less randomly select one.

Query One: << This query is the key to identifying all the records to
keep>>
SELECT First(PrimaryKeyField) as FirstID
FROM TheTable
GROUP BY TelephoneNumber, Xfield, YField, ZField, <<List the all fields
that should identify unique records)

Query Two:
DELETE DistinctRow T.*
FROM TheTable as T
WHERE T.PrimaryKeyField IN
(SELECT PrimaryKeyField
FROM TheTable LEFT JOIN QKeepThese
ON TheTable.PrimaryKeyField= QKeepThese.FirstID
WHERE qKeepThese.FirstID is Null)

All in one query would be as follows - only works if field and table
names don't contain "Special" characters.

DELETE DistinctRow T.*
FROM TheTable as T
WHERE T.PrimaryKeyField IN
(SELECT PrimaryKeyField
FROM TheTable LEFT JOIN
(SELECT First(PrimaryKeyField) as FirstID
FROM TheTable
GROUP BY TelephoneNumber) AS QKeepThese
ON TheTable.PrimaryKeyField= QKeepThese.FirstID
WHERE qKeepThese.FirstID is Null)

narasimharaosama · Feb 5, 2007

I want to delete the records that match on the 8 specified fields and
my new table should contain the records that contains all the 15
fields but
the ones that matchs on these 8 fields must be removed.I hope some
body can come up with the solution with this.

Thanks

Narasimha

John Spencer · Feb 5, 2007

Do you want to delete all the records that are duplicated? Or do you want
to retain one of the duplicated records?

And if you want to retain one of the duplicated records, what criteria do
you want to use to determine which one of the duplicates you want to retain
or do you just want to retain any one of the duplicated records?

Do you know what a primary key is? If so, do you have ONE field that is the
primary key or do you have multiple fields that are the primary key field?

--
John Spencer
Access MVP 2002-2005, 2007
Center for Health Program Development and Management
University of Maryland Baltimore County
..

narasimharaosama · Feb 5, 2007

I want to retain any one duplicated entry.And there are multiple
fields that are primary key field.

John Spencer · Feb 6, 2007

Probably the best way in this situation would be to build a new table
that matches your current table. Then add a multi-field unique index
based on your 8 fields.

Append all the records from the old table to the new table. Access will
add one of each unique record to the table. Once that is done and you
are satisified with the result you can delete the old table and re-name
the the new table with the old table's name.

Another option is to build a query getting the first of all the fields
other than the eight that are duplicates (group by them) and use that as
the basis of creating a new table

INSERT INTO NewTable (PkPartOne, PkPartTwo
,Field1, Field2, Field3, Field4
,Field5, Field6, Field7, Field8)
SELECT First(PKPartOne), First(PKPartTwo)
,Field1, Field2, Field3, Field4
,Field5, Field6, Field7, Field8
FROM TheMasterTable
GROUP BY ,Field1, Field2, Field3, Field4,
Field5, Field6, Field7, Field8

If those two don't work for you, post back

narasimharaosama · Feb 7, 2007

Thank you for the response.I tried doing the method u mentioned.I
tried to append to the new table.But Its not working properly.I have
seen that data in one of the 8 fields is missing in some rows.So I
guess am not able to apply unique index.Please suggest me What I can
try now.

Thanks

Narasimha

John Spencer · Feb 8, 2007

IF only one of the fields can have nulls, you could populate all the null
fields with a value that you know is not used. For instance, if this was a
number field that never holds a negative value, then populate it with a
negative value.

Then use the 8 field unique index method.

Then remove the compound index and use an update query to set the negative
value back to null

--
John Spencer
Access MVP 2002-2005, 2007
Center for Health Program Development and Management
University of Maryland Baltimore County
..

narasimharaosama · Feb 10, 2007

Its true that one of the fields be null.But being a large database
its proving tough for me to search which field in which row is null.

Thanks

Narasimha

John Spencer · Feb 10, 2007

Use an update query.

UPDATE YourTable
SET YourTable.YourField = "xxxxx!!##"
WHERE YourTable.YourField Is Null

After you de-duplicated your table using one of the techniques, then use
another update query

UPDATE YourNewTable
SET YourNewTable.YourField = Null
WHERE YourTable.YourField = "xxxxx!!##"

narasimharaosama · Feb 12, 2007

hii,

I have one more problem in this method.I am getting validation errors
when i tried to run the append query. I have seen if there are any
validation rules set for the fields.But there are no such rules
set.Still I am getting this error.Please help me out.

Narasimha

John Spencer · Feb 12, 2007

Of course, you are getting errors. Try ignoring them and let the append
happen anyway.

You should be getting append errors since you are trying to append records
where the values are duplicated for your 8 index field. If you go ahead and
append the database engine will pick one of the duplicated records to add to
the table and will ignore the others.

--
John Spencer
Access MVP 2002-2005, 2007
Center for Health Program Development and Management
University of Maryland Baltimore County
..

narasimharaosama · Feb 13, 2007

I am getting validation errors and that too in around large number,
greater than the number of duplicates.I know the number of duplicates
in the table by using duplicate query in ACCESS.Do you have any
solution for this?Please let me know.

Thank you

Narasimha

John Spencer · Feb 13, 2007

I am sorry, but obviously I do not understand your table structure. I have
nothing further to add towards solving your problem. Perhaps you should
start a new thread and see if someone else has some ideas to help you

--
John Spencer
Access MVP 2002-2005, 2007
Center for Health Program Development and Management
University of Maryland Baltimore County
..

Deleting duplicates entries in Database using MS Access

narasimharaosama

Smartin

John Spencer

narasimharaosama

John Spencer

narasimharaosama

John Spencer

narasimharaosama

John Spencer

narasimharaosama

John Spencer

narasimharaosama

John Spencer

narasimharaosama

John Spencer