how do i eliminate duplicate records?

R

RobR

I have a spreadsheet with about 30,000 records. It's a combination of two
different databases. I need to eliminate all duplicate records (it's determined
to be a duplicate if there more than one item in column A has the same value).
Then what's left over will be a list of records that are in one database but not
the other. What's the best way to go about this? Thanks!
 
R

RobR

Actually my description is misleading, I don't want to eliminate only
the duplicate, what I want to do it if a duplicate is found, I want to
eliminate BOTH records.
 
D

Dave Peterson

I'd use a helper column adjacent to that key column:

=countif(a:a,a1)
and drag down
and let excel calculate

Then convert that column to values
Select the column
edit|copy
edit|paste special|values

Then filter on that helper column.
show the values greater than 1
and delete those visible rows.

Deleting may work better if you sort by that key column first.
 
D

Don Guillett

Assuming you can sort first, this should work

Sub deletebothifsame()
On Error Resume Next
For i = Cells(Rows.Count, "a").End(xlUp).Row To 1 Step -1
If Cells(i - 1, 1).Value = Cells(i, 1).Value Then
Cells(i - 1, 1).Resize(2).EntireRow.Delete
End If
Next i
End Sub
 
R

RobR

Thanks for the help, I'm ALMOST there. I'm just trying to
delete the rows now (also tried to just select and copy)
and am being told:

cannot create or use the data range reference because it is too complex. Try to:
- Use data that can be selected in one continuous rectangle
- use data from the same sheet
 
R

RobR

I think I got it. I just sorted by the helper column
instead of using a filter, then just scrolled down
until I hit the 2s and cut everything downward.
(At least that's the plan,haven't done it yet :)).

RobR said:
Thanks for the help, I'm ALMOST there. I'm just trying to
delete the rows now (also tried to just select and copy)
and am being told:

cannot create or use the data range reference because it is too complex. Try to:
- Use data that can be selected in one continuous rectangle
- use data from the same sheet
 
D

Dave Peterson

That was my (almost hidden) warning was about:

If you can sort by the helper column, then it will be in ascending order and
your delete process will work.

If you have to put the data back in its original order, I do this:

Insert another helper column (column A???)
type 1 in A1
type 2 in A2
select a1:a2 and drag down the column

Then you can sort by the "duplicate" helper column, delete the duplicates, then
resort by the "index" helper column.

When you're done, just delete that "index" helper column.
 
R

RobR

Thank you for all the help, I found my missing $4.4 million. Now I
just need someone smarter than me to figure out why it wasn't pulled
into the database :).
 
R

RobR

Thanks Don, wish I had seen this before I started down the other road.
I suspect I'll be doing this again though so I've bookmarked this thread.
Next time around I'll use your method.
 
D

Dave Peterson

Maybe it was just rounding error <vbg>.

Or check for the guy "vacationing" in the Bahamas!
Thank you for all the help, I found my missing $4.4 million. Now I
just need someone smarter than me to figure out why it wasn't pulled
into the database :).
 
R

RobR

Thanks, the values will just be 1 or 2 for my current issue.
2 if it's in both databases, 1 if it's in only the original database.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top