Highlight/remove duplicate rows

B

Blasting Cap

I have a spreadsheet with about 30,000 records in it, that has names &
addresses, as well as ID numbers in it.

The problem is, there are some duplicates in it.

Not strictly duplicates, but duplicates nonetheless.

The id numbers are unique, but other columns may only be similar.

They'd have the same City, State & Zip, but there may be differences like:

Bud's Country Store
123 N Main St
Podunk, IA 40444

Buds Country Store
123 North Main Street
Podunk, IA 40444

Even though the ID number would be unique for each of them, these are
duplicate records, even though there are slight differences in the way
the name is listed or even in the address line.

Is there a macro that I can use to highlight & remove duplicates in a
case like this?

Thanks,

BC
 
G

Guest

I would write a routine that compared company name, street address, and
city/state and when two of three match, copy these rows to a check sheet.
Then manually remove items from the check sheet that aren't duplicates.
Finally write a second script that removes the remaining item on the check
sheet.

I've used this process many times.
 
B

Blasting Cap

Can you post some sample code that's part of your routine?

I'm not sure if the 2 of 3 match would not give me a lot of false
positives. For instance, if I were comparing grocery stores, I could
conceivably have any number of correct matches within the same city.

What I need to do is to remove/standardize differences - Street becomes
ST., Avenue becomes AVE and so on.

BC
 
G

Guest

This is a good example. It is a quick test. See how well it works. Modify
as necessary. Add a workseet DupNames to check results. The code will
probably take a few minutes to run with 10000 lines. It is comapring every
line with every other line with is (Rows * (Rows - 1))/2

Sub cmpcompnames()
Const DupNames = "DupNames"

LastRows = Cells(Rows.Count, "A").End(xlUp).Row

Duprows = 1
With Worksheet(DupNames)

For I = 1 To (LastRows - 1)
For J = (I + 1) To LastRow

CompareCount = 0
If Cells(I, "A") = Cells(J, "A") Then
CompareCount = 1
End If
If Cells(I, "B") = Cells(J, "B") Then
CompareCount = CompareCount + 1
End If
If Cells(I, "C") = Cells(J, "C") Then
CompareCount = CompareCount + 1
End If

'check for near and perfect matches
If CompareCount > 2 Then
.Cells(RowCount, "A") = Cells(I, "A")
.Cells(RowCount, "B") = Cells(I, "B")
.Cells(RowCount, "C") = Cells(I, "C")
.Cells(RowCount, "D") = Cells(J, "A")
.Cells(RowCount, "E") = Cells(J, "B")
.Cells(RowCount, "F") = Cells(J, "C")
Duprows = Duprows + 1
End If
Next J
Next I
End With

End Sub
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top