PC Review


Reply
Thread Tools Rate Thread

Could anyone help me to remove duplicate rows?

 
 
Leon
Guest
Posts: n/a
 
      21st Jul 2003
I have a table that has more than 1 milion rows so practically it is
impossible to remove all duplicate rows by hand.
Could you help me to remove those duplicate rows at all?
This table doesn't have primary key( identity ) column.

Say Table name => MyTable
Column => Col1,Col2,Col3
No primary key column here.

Thanks a lot in advance.
 
Reply With Quote
 
 
 
 
Brian P. Hammer
Guest
Posts: n/a
 
      21st Jul 2003
You can use the duplicate record query in Access to find them.

You can create an Access table matching the one with the duplicates and then
set on field as no duplicates and import the records. This will remove the
duplicate records.

Or using code:

SELECT Table1.Field1, Table1.ID
FROM Table1
WHERE (((Table1.Field1) In (SELECT [Field1] FROM [Table1] As Tmp GROUP BY
[Field1] HAVING Count(*)>1 )))
ORDER BY Table1.Field1;

And then just execute a delete but be careful, this returns all duplicate
records not just the duplicates of x

HTH,
--
Brian P. Hammer
"Leon" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> I have a table that has more than 1 milion rows so practically it is
> impossible to remove all duplicate rows by hand.
> Could you help me to remove those duplicate rows at all?
> This table doesn't have primary key( identity ) column.
>
> Say Table name => MyTable
> Column => Col1,Col2,Col3
> No primary key column here.
>
> Thanks a lot in advance.



 
Reply With Quote
 
Stephen Muecke
Guest
Posts: n/a
 
      21st Jul 2003
Leon,
Are you populating the table from a database? If so, you might be able to
use DISTINCT in you SELECT statement to avoid loading them in the first
place.
Another approach (I haven't tested this!) might be to set constraints before
filling the table (a PrimaryKey comprising all 3 columns) and handle the
FillError event of the DataAdaptor

Stephen

"Leon" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> I have a table that has more than 1 milion rows so practically it is
> impossible to remove all duplicate rows by hand.
> Could you help me to remove those duplicate rows at all?
> This table doesn't have primary key( identity ) column.
>
> Say Table name => MyTable
> Column => Col1,Col2,Col3
> No primary key column here.
>
> Thanks a lot in advance.



 
Reply With Quote
 
Bob Grommes
Guest
Posts: n/a
 
      21st Jul 2003
Brian,

I'm not sure that would work if Field1 is not the PK. What Leon is saying,
if I understand him correctly, is he has 3 fields in the table and they are
effectively the PK. So it would be something along the lines of:

SELECT Field1 + Field2 + Field3 FROM MyTable
GROUP BY Field1 + Field2 + Field3 HAVING COUNT(*) > 1
ORDER BY Field1,Field2,Field3

That will give you one copy of each duplicate; then, for each duplicate I'd
probably save the 3 field values, delete each group of records in the
underlying table and then re-insert one record:

/* Inside a loop scanning through the above result */
@@FieldOne = Field1
@@FieldTwo = Field2
@@FieldThree = Field3
DELETE FROM MyTable WHERE Field1 = @@FieldOne AND Field2 = @@FieldTwo AND
Field3 = @@FieldThree
INSERT INTO MyTable (Field1,Field2,Field3) VALUES
(@@FieldOne,@@FieldTwo,@@FieldThree)

.... but, I'd wrap all that in a transaction for safety, to be on the safe
side.

And of course, experiment on a copy of the original table for safety's sake.

--Bob


"Brian P. Hammer" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> You can use the duplicate record query in Access to find them.
>
> You can create an Access table matching the one with the duplicates and

then
> set on field as no duplicates and import the records. This will remove the
> duplicate records.
>
> Or using code:
>
> SELECT Table1.Field1, Table1.ID
> FROM Table1
> WHERE (((Table1.Field1) In (SELECT [Field1] FROM [Table1] As Tmp GROUP BY
> [Field1] HAVING Count(*)>1 )))
> ORDER BY Table1.Field1;
>
> And then just execute a delete but be careful, this returns all duplicate
> records not just the duplicates of x
>
> HTH,
> --
> Brian P. Hammer
> "Leon" <(E-Mail Removed)> wrote in message
> news:(E-Mail Removed)...
> > I have a table that has more than 1 milion rows so practically it is
> > impossible to remove all duplicate rows by hand.
> > Could you help me to remove those duplicate rows at all?
> > This table doesn't have primary key( identity ) column.
> >
> > Say Table name => MyTable
> > Column => Col1,Col2,Col3
> > No primary key column here.
> >
> > Thanks a lot in advance.

>
>



 
Reply With Quote
 
TaeHo Yoo
Guest
Posts: n/a
 
      21st Jul 2003
First of all, thanks a lot for all of you.
I think Bob Grommes's solution sounds good.
I might try this first and see how it goes.

Thanks again for all of you to help me out.

I will let you know the result.




*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!
 
Reply With Quote
 
TaeHo Yoo
Guest
Posts: n/a
 
      21st Jul 2003
What I did was
-----------------------------------------------------------
DECLARE @FieldOne as int
DECLARE @FieldTwo as varchar(50)
DECLARE @FieldThree as varchar(40)
DECLARE @FieldFour as varchar(50)
DECLARE @FieldFive as varchar(50)
DECLARE @value as int

BEGIN TRAN DELETE_Du
SELECT
@FieldOne=[1],@FieldTwo=[2],@FieldThree=[3],@FieldFour=[4],@FieldFive=[5
],@value=value FROM [5]
GROUP BY [1],[2],[3],[4],[5],value HAVING COUNT(*) > 1
While @FieldOne <> NULL
BEGIN
DELETE [5] WHERE @FieldOne=[1] and @FieldTwo=[2] and @FieldThree=[3]
and @FieldFour=[4] and @FieldFive=[5] and @value=value

INSERT INTO [5] ([1],[2],[3],[4],[5],value) VALUES
(@FieldOne,@FieldTwo,@FieldThree,@FieldFour,@FieldFive,@value)
SELECT
@FieldOne=[1],@FieldTwo=[2],@FieldThree=[3],@FieldFour=[4],@FieldFive=[5
],@value=value FROM [5]
GROUP BY [1],[2],[3],[4],[5],value HAVING COUNT(*) > 1
END

COMMIT TRAN DELETE_Du

-----------------------------------------------------------
And the message I had was "the command(s) completed successfully" but
the duplicates are still there.

What did I miss?

Thanks a lot~~~~~~~~




*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!
 
Reply With Quote
 
TaeHo Yoo
Guest
Posts: n/a
 
      21st Jul 2003
Thanks a lot for your help.
but still I have a problem.
what I did was,
------------------------------------------------------
DECLARE @FieldOne as int
DECLARE @FieldTwo as varchar(50)
DECLARE @FieldThree as varchar(40)
DECLARE @FieldFour as varchar(50)
DECLARE @FieldFive as varchar(50)
DECLARE @value as int

BEGIN TRAN DELETE_Du
SELECT
@FieldOne=[1],@FieldTwo=[2],@FieldThree=[3],@FieldFour=[4],@FieldFive=[5
],@value=value FROM [5]
GROUP BY [1],[2],[3],[4],[5],value HAVING COUNT(*) > 1
While @FieldOne <> NULL
BEGIN
DELETE [5] WHERE @FieldOne=[1] and @FieldTwo=[2] and @FieldThree=[3]
and @FieldFour=[4] and @FieldFive=[5] and @value=value

INSERT INTO [5] ([1],[2],[3],[4],[5],value) VALUES
(@FieldOne,@FieldTwo,@FieldThree,@FieldFour,@FieldFive,@value)
SELECT
@FieldOne=[1],@FieldTwo=[2],@FieldThree=[3],@FieldFour=[4],@FieldFive=[5
],@value=value FROM [5]
GROUP BY [1],[2],[3],[4],[5],value HAVING COUNT(*) > 1
END

COMMIT TRAN DELETE_Du
---------------------------------------------------------
and this end of this query, it said "The command(s) completed
successfully."
Did I miss something?

Thanks a lot..


*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!
 
Reply With Quote
 
mritacco@gmail.com
Guest
Posts: n/a
 
      28th Sep 2004
This is the method I use to remove duplicate records from a table. The
first three columns in this example are assumed to be the PK.

1. Backup the table or the database. I usually just do a SELECT * INTO
FROM SOURCE_TABLE

2. Find all ROWS that have a duplicate and stage.

SELECT COL1, COL2, COL3, COL4, count(*)
into #TEMP_DATA
FROM SOURCE_TABLE
GROUP BY COL1, COL2, COL3, COL4
HAVING count(*) > 1

3. REMOVE "ALL" records which have duplicates from the source table.

DELETE SOURCE_TABLE
FROM SOURCE_TABLE a, #TEMP_DATA b
WHERE a.COL1 = b.COL1
and a.COL2 = b.COL2
and a.COL3 = b.COL3

4. INSERT THE "SINGLE" RECORDS BACK INTO THE TABLE

INSERT INTO SOURCE_TABLE (COL1, COL2, COL3, COL4)
SELECT COL1, COL2, COL3, COL4
FROM #TEMP_DATA2

5. Duplicates are now gone. No Rows returned.

SELECT COL1, COL2, COL3, COL4, count(*)
FROM SOURCE_TABLE
GROUP BY COL1, COL2, COL3, COL4
HAVING count(*) > 1

6. Delete Backup table.

 
Reply With Quote
 
 
 
Reply

Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to completely remove duplicate rows in vba geniusideas Microsoft Excel Programming 2 27th Aug 2009 10:46 PM
Remove Duplicate Rows Philidec Microsoft Excel Worksheet Functions 0 26th Apr 2009 10:08 PM
How do I remove Duplicate rows? =?Utf-8?B?ODUyMjU=?= Microsoft Excel Misc 15 9th Mar 2007 11:41 PM
Remove duplicate rows =?Utf-8?B?dGJhc2lj?= Microsoft Excel Misc 1 15th Apr 2004 08:52 PM
Could anyone help me to remove duplicate rows? TaeHo Yoo Microsoft ASP .NET 1 21st Jul 2003 08:41 AM


Features
 

Advertising
 

Newsgroups
 


All times are GMT +1. The time now is 05:51 AM.