delete duplicate entries in a large worksheet

D

David

Does anyone know a way to delete rows with duplicate data in a document with
over 60,000 rows? Some data is duplicated as much as 10 times and some as
few as once or not at all.
 
X

xlmate

try using Advance Filter

1. Go to Data on the menu bar
2. Select Filter
3. Chosse Advance Filter
4. check Copy to another location ( to keep the original data )
5. Select the range
6. Select a cell on the same worksheet which is out of the existing data range
7. check the Unique records only
8. click OK

HTH
--
Your feedback is very much appreciate, pls click on the Yes button below if
this posting is helpful.

Thank You

cheers, francis
 
C

Chip Pearson

Try some code like the following:


Sub AAA()

Dim LastRow As Long
Dim TestColumn As String
Dim RowNdx As Long
Dim TopRow As Long
Dim WS As Worksheet
Dim DeleteThese As Range

Set WS = ActiveSheet
TestColumn = "A" '<<<< column to test for duplicates
TopRow = 1 '<<<< top-most row of data to test.

With WS
LastRow = .Cells(.Rows.Count, TestColumn).End(xlUp).Row
For RowNdx = LastRow To TopRow Step -1
If Application.CountIf(.Range(.Cells(TopRow, TestColumn), _
.Cells(RowNdx, TestColumn)), _
.Cells(RowNdx, TestColumn)) > 1 Then
If DeleteThese Is Nothing Then
Set DeleteThese = .Rows(RowNdx)
Else
Set DeleteThese = _
Application.Union(DeleteThese, .Rows(RowNdx))
End If
End If
Next RowNdx
End With
If Not DeleteThese Is Nothing Then
DeleteThese.Delete
End If

End Sub


Change TestColumn to the letter of the column that is to be used to
test for duplicates. Change TopRow to a value > 1 if your worksheet
has some header rows that should not be tested.

Cordially,
Chip Pearson
Microsoft Most Valuable Professional
Excel Product Group, 1998 - 2009
Pearson Software Consulting, LLC
www.cpearson.com
(email on web site)
 
J

JB

For Sort BD:

Sub DeleteDuplicateClassic()
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
[A1].Sort Key1:=Range("A2"), Order1:=xlAscending, Header:=xlGuess
For i = [A65000].End(xlUp).Row To 2 Step -1
If Cells(i, 1) = Cells(i - 1, 1) Then Rows(i).Delete
Next i
Application.Calculation = xlCalculationAutomatic
End Sub

Respect Order:2 s for 10.0000 rows and suppress 5%

Sub RespectOrderDictionary()
Set MonDico = CreateObject("Scripting.Dictionary")
Application.ScreenUpdating = False
i = 2
Do While Cells(i, "A") <> ""
If Not MonDico.Exists(Cells(i, "A") & Cells(i, "C")) Then
MonDico.Add Cells(i, "A") & Cells(i, "C"), Cells(i, "A") &
Cells(i, "C")
i = i + 1
Else
Rows(i).EntireRow.Delete
End If
Loop
End Sub

1,17 sec for 10.000 rows and 80% suppress:

Sub DeleteDuplicateQuick()
t = Timer()
Application.ScreenUpdating = False
[A1].Sort Key1:=Range("A2"), Order1:=xlAscending, _
Header:=xlGuess
Columns("b:b").Insert Shift:=xlToRight
[B1] = "ColB"
[B2].FormulaR1C1 = "=IF(RC[-1]=R[-1]C[-1],1,0)"
[B2].AutoFill Destination:=Range("B2:B" & [A65000].End(xlUp).Row)
[B:B].Value = [B:B].Value
[A2].CurrentRegion.Sort Key1:=Range("B2"), Order1:=xlAscending,
Header:=xlGuess
[B:B].Replace What:="1", Replacement:="", LookAt:=xlPart
Range("B2:B65000").SpecialCells(xlCellTypeBlanks).EntireRow.Delete
Columns("b:b").Delete Shift:=xlToLeft
MsgBox Timer() - t
End Sub

http://cjoint.com/?bvpZzDBano

JB
http://boisgontierjacques.free.fr
 
D

David

Thanks for the responses everyone. Because I don't know anything about VBA
I'm going to try this option first. One follow up question. Is there a way
to tell how many rows were filtered or how many remain?
 
X

xlmate

Hi David,
You may also try this formula
I am assuming that your range data is in Col A and duplicates can be found
in the same Col. Change the range to yours

=IF(COUNTIF($A$2:A2,A2)>1,"TRUE","")

copy this formula and select a cell, let C2, Ctrl-V into the formula bar.
Copy as far down as your data is.
Filter for Blanks in Col C, Excel will show all entries with a blank in Col C
Copy this set of data to a new sheet.

or you can filter for "TRUE" in col C and delete all entries with the word
"TRUE"
either way will give you a set of unique data range.

pls note to save a back up before doing any of the suggestions.

To count the number of entries, use =count(A:A)

HTH
--
Your feedback is very much appreciate, pls click on the Yes button below if
this posting is helpful.

Thank You

cheers, francis
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top