locating duplicate records in a 300,000 record file

R

robro

I previously had help with writing a macro to eliminate duplicate
records, and it worked great which told me there are two duplicate
records. Now I need to quickly identify them.

There is one column of numbers, sorted in numeric order. I just need
to place an X
in a helper column for each duplicate record so I can then sort by the
helper column. Basically for this to be fast, I just want to compare
one cell with the one above it. I've yet to figure out how to do
that. Any help appreciated! I've googled up a storm and all I'm
finding is deleting dupes or placing an X in a helper column but
comparing all records in column A which will take too long with
300,000 records.

Thanks!
 
J

JP

What Excel version do you have? I assume 2007, because you can't have
300k rows in any other version.

Let's say your data runs from A2:A300000 (with header in A1)

Column B will be the helper column. Starting from B3, enter this
formula:

=IF(COUNTIF(A2:A3,A3)=1,"X","")

Fill down and then Copy>Paste values. Any cell with an "X" has a
duplicate in the cell immediately above the corresponding cell in
column A. For example if A3 and A4 were dupes, cell B4 would have an
"X" in it.

Keep in mind this won't catch dupes found elsewhere in the range.

HTH,
JP
 
B

Billy Liddel

Robro

Sub MarkDupes()
'Assumes data in column A ' Adjust to suit
Dim i As Long, nr As Long
Range("A1").Select
nr = ActiveCell.CurrentRegion.Rows.Count
For i = 2 To nr
If Cells(i, 1) = Cells(i - 1, 1) Then
Cells(i, 2) = "X"
End If
Next i
End Sub

Will mark cells in column b

Peter
 
R

robro

Thanks for both responses. I guess my first macro did something odd
because using this method, there are no duplicates.
Damn, I'm trying to figure out why a SQL data import is crashing with
duplicate errors and my SQL guy is unavailable so
I'm resorting to guessing and using Excel (yes Excel 2007, I was very
excited to find it could handle so many rows when
it first came out. That was one of the first features I checked for).
 
P

Pete_UK

Why not sort the data first and then put this in B2:

=IF(A1=A2,"X","")

and copy down? Then apply a filter to column B to locate all the X's
(duplicates).

Hope this helps.

Pete
 
T

T. Valko

Excel 2007

Assume the range is A1:A300000
Select the range A1:A300000
Goto Home tab>Styles>Conditional Formatting>New rule>Format only unique or
duplicate values

From the drop down select duplicate
Click the Format button
Select the desired style(s)
OK out
 
S

Shane Devenshire

Hi,

Since you are using 2007, if you want to remove the duplicates you can choose the command Data, Remove Duplicates. As already mentioned you can use conditional formatting to mark the duplicates. But with 300000 you may want to locate the found records quickly, so:

1. After applying the conditional format mentioned in a previous reply
2. Select the 300000 rows and choose the command Home, Sort & Filter, Custom Filter
3. In the Sort box under Sort by pick the column you want to sort on, under Sort on pick Cell Color, under Order select No Cell Color and On Bottom, then click OK.

This will put all the duplicates at the top.

Cheers,
Shane Devenshire
Microsoft Excel MVP
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top