Could anyone help me to remove duplicate rows?

Leon · Jul 21, 2003

I have a table that has more than 1 milion rows so practically it is
impossible to remove all duplicate rows by hand.
Could you help me to remove those duplicate rows at all?
This table doesn't have primary key( identity ) column.

Say Table name => MyTable
Column => Col1,Col2,Col3
No primary key column here.

Thanks a lot in advance.

Brian P. Hammer · Jul 21, 2003

You can use the duplicate record query in Access to find them.

You can create an Access table matching the one with the duplicates and then
set on field as no duplicates and import the records. This will remove the
duplicate records.

Or using code:

SELECT Table1.Field1, Table1.ID
FROM Table1
WHERE (((Table1.Field1) In (SELECT [Field1] FROM [Table1] As Tmp GROUP BY
[Field1] HAVING Count(*)>1 )))
ORDER BY Table1.Field1;

And then just execute a delete but be careful, this returns all duplicate
records not just the duplicates of x

HTH,

Stephen Muecke · Jul 21, 2003

Leon,
Are you populating the table from a database? If so, you might be able to
use DISTINCT in you SELECT statement to avoid loading them in the first
place.
Another approach (I haven't tested this!) might be to set constraints before
filling the table (a PrimaryKey comprising all 3 columns) and handle the
FillError event of the DataAdaptor

Stephen

Bob Grommes · Jul 21, 2003

Brian,

I'm not sure that would work if Field1 is not the PK. What Leon is saying,
if I understand him correctly, is he has 3 fields in the table and they are
effectively the PK. So it would be something along the lines of:

SELECT Field1 + Field2 + Field3 FROM MyTable
GROUP BY Field1 + Field2 + Field3 HAVING COUNT(*) > 1
ORDER BY Field1,Field2,Field3

That will give you one copy of each duplicate; then, for each duplicate I'd
probably save the 3 field values, delete each group of records in the
underlying table and then re-insert one record:

/* Inside a loop scanning through the above result */
@@FieldOne = Field1
@@FieldTwo = Field2
@@FieldThree = Field3
DELETE FROM MyTable WHERE Field1 = @@FieldOne AND Field2 = @@FieldTwo AND
Field3 = @@FieldThree
INSERT INTO MyTable (Field1,Field2,Field3) VALUES
(@@FieldOne,@@FieldTwo,@@FieldThree)

.... but, I'd wrap all that in a transaction for safety, to be on the safe
side.

And of course, experiment on a copy of the original table for safety's sake.

--Bob

Brian P. Hammer said:
You can use the duplicate record query in Access to find them.

You can create an Access table matching the one with the duplicates and then
set on field as no duplicates and import the records. This will remove the
duplicate records.

Or using code:

SELECT Table1.Field1, Table1.ID
FROM Table1
WHERE (((Table1.Field1) In (SELECT [Field1] FROM [Table1] As Tmp GROUP BY
[Field1] HAVING Count(*)>1 )))
ORDER BY Table1.Field1;

And then just execute a delete but be careful, this returns all duplicate
records not just the duplicates of x

HTH,
--
Brian P. Hammer

Leon said:

I have a table that has more than 1 milion rows so practically it is
impossible to remove all duplicate rows by hand.
Could you help me to remove those duplicate rows at all?
This table doesn't have primary key( identity ) column.

Say Table name => MyTable
Column => Col1,Col2,Col3
No primary key column here.

Thanks a lot in advance.

Click to expand...

TaeHo Yoo · Jul 21, 2003

First of all, thanks a lot for all of you.
I think Bob Grommes's solution sounds good.
I might try this first and see how it goes.

Thanks again for all of you to help me out.

I will let you know the result.

TaeHo Yoo · Jul 21, 2003

What I did was
-----------------------------------------------------------
DECLARE @FieldOne as int
DECLARE @FieldTwo as varchar(50)
DECLARE @FieldThree as varchar(40)
DECLARE @FieldFour as varchar(50)
DECLARE @FieldFive as varchar(50)
DECLARE @value as int

BEGIN TRAN DELETE_Du
SELECT
@FieldOne=[1],@FieldTwo=[2],@FieldThree=[3],@FieldFour=[4],@FieldFive=[5
],@value=value FROM [5]
GROUP BY [1],[2],[3],[4],[5],value HAVING COUNT(*) > 1
While @FieldOne <> NULL
BEGIN
DELETE [5] WHERE @FieldOne=[1] and @FieldTwo=[2] and @FieldThree=[3]
and @FieldFour=[4] and @FieldFive=[5] and @value=value

INSERT INTO [5] ([1],[2],[3],[4],[5],value) VALUES
(@FieldOne,@FieldTwo,@FieldThree,@FieldFour,@FieldFive,@value)
SELECT
@FieldOne=[1],@FieldTwo=[2],@FieldThree=[3],@FieldFour=[4],@FieldFive=[5
],@value=value FROM [5]
GROUP BY [1],[2],[3],[4],[5],value HAVING COUNT(*) > 1
END

COMMIT TRAN DELETE_Du

-----------------------------------------------------------
And the message I had was "the command(s) completed successfully" but
the duplicates are still there.

What did I miss?

Thanks a lot~~~~~~~~

TaeHo Yoo · Jul 21, 2003

Thanks a lot for your help.
but still I have a problem.
what I did was,
------------------------------------------------------
DECLARE @FieldOne as int
DECLARE @FieldTwo as varchar(50)
DECLARE @FieldThree as varchar(40)
DECLARE @FieldFour as varchar(50)
DECLARE @FieldFive as varchar(50)
DECLARE @value as int

BEGIN TRAN DELETE_Du
SELECT
@FieldOne=[1],@FieldTwo=[2],@FieldThree=[3],@FieldFour=[4],@FieldFive=[5
],@value=value FROM [5]
GROUP BY [1],[2],[3],[4],[5],value HAVING COUNT(*) > 1
While @FieldOne <> NULL
BEGIN
DELETE [5] WHERE @FieldOne=[1] and @FieldTwo=[2] and @FieldThree=[3]
and @FieldFour=[4] and @FieldFive=[5] and @value=value

INSERT INTO [5] ([1],[2],[3],[4],[5],value) VALUES
(@FieldOne,@FieldTwo,@FieldThree,@FieldFour,@FieldFive,@value)
SELECT
@FieldOne=[1],@FieldTwo=[2],@FieldThree=[3],@FieldFour=[4],@FieldFive=[5
],@value=value FROM [5]
GROUP BY [1],[2],[3],[4],[5],value HAVING COUNT(*) > 1
END

COMMIT TRAN DELETE_Du
---------------------------------------------------------
and this end of this query, it said "The command(s) completed
successfully."
Did I miss something?

Thanks a lot..

mritacco · Sep 28, 2004

This is the method I use to remove duplicate records from a table. The
first three columns in this example are assumed to be the PK.

1. Backup the table or the database. I usually just do a SELECT * INTO
FROM SOURCE_TABLE

2. Find all ROWS that have a duplicate and stage.

SELECT COL1, COL2, COL3, COL4, count(*)
into #TEMP_DATA
FROM SOURCE_TABLE
GROUP BY COL1, COL2, COL3, COL4
HAVING count(*) > 1

3. REMOVE "ALL" records which have duplicates from the source table.

DELETE SOURCE_TABLE
FROM SOURCE_TABLE a, #TEMP_DATA b
WHERE a.COL1 = b.COL1
and a.COL2 = b.COL2
and a.COL3 = b.COL3

4. INSERT THE "SINGLE" RECORDS BACK INTO THE TABLE

INSERT INTO SOURCE_TABLE (COL1, COL2, COL3, COL4)
SELECT COL1, COL2, COL3, COL4
FROM #TEMP_DATA2

5. Duplicates are now gone. No Rows returned.

SELECT COL1, COL2, COL3, COL4, count(*)
FROM SOURCE_TABLE
GROUP BY COL1, COL2, COL3, COL4
HAVING count(*) > 1

6. Delete Backup table.

Could anyone help me to remove duplicate rows?	1	Jul 21, 2003
SELECT Distinct from DataTable, without using SQL	4	Jan 27, 2005
Removing rows from a DataTable is VERY slow	32	Nov 30, 2004
Dataset filtering problem	3	Sep 13, 2003
Still having problems inserting a row into a table with uniqueidentifier column...	1	Apr 20, 2004
"Select Distinct ..." or something else?	2	Oct 12, 2004
Mona - Help!!! - or - Why won't my DataGridTableStyle apply?	1	Jul 22, 2005
How can you duplicate rows using SQL...	1	Jun 9, 2005

Could anyone help me to remove duplicate rows?

Leon

Brian P. Hammer

Stephen Muecke

Bob Grommes

TaeHo Yoo

TaeHo Yoo

TaeHo Yoo

mritacco

Ask a Question

Similar Threads