How? compare two worksheets and identify rows of data that appear on both.

  • Thread starter Thread starter simon steel
  • Start date Start date
S

simon steel

Ok,

Suppose I have 2 worksheets.

Sheet 1 has 1000 lines of data spanning xNumber of columns.

Sheet 2 has 500 lines of data, also spanning the same number of columns.
An unknown number of identical lines of data appear on both sheets.

How can I get to a point where I can build one worksheet that is a composite
of both sheets - such that no identical line of data appears twice on the
final sheet?


Thanks in advance, Simon.
 
You could do this several ways. It really depends on your data
integrity and what sort of data it is.

-- Use a VLOOKUP OR INDEX/MATCH formula to compare the 500 row ss
against the 1000 row ss. If you get a hit, delete the row, then paste
the two wkbks together.

OR

-- Paste the workbooks together, sort by column of your choice, then
use a procedure such as this to remove the duplicates.

Sub Deletedupes()
Dim rng As Excel.Range
Dim Col As Long
Dim X As Long
Dim r As Long
Application.ScreenUpdating = False

Set rng = ActiveSheet.UsedRange.Rows
Col = InputBox("What Column to use?" & vbCrLf & vbCrLf & "ex: for
column A, enter 1.")

X = rng.Rows.count

For r = X To 2 Step -1

If Cells(r - 1, Col) = Cells(r, Col) Then
' If Cells(r - 1, Col + 10) = Cells(r, Col + 10) Then
Cells(r, Col).EntireRow.Delete
' Else
' Cells(r - 1, Col).EntireRow.Delete
' End If
End If
Next r

Application.ScreenUpdating = True
Application.StatusBar = False
End Sub



HTH,
JP
 
JP thanks for help.
i'll look at these - they should get me started.

Presume the second method is a macro?
any chance of a very quick overiew of what each bit of the macro is doing -
I've little (almost none) experience with macros.

Simon


You could do this several ways. It really depends on your data
integrity and what sort of data it is.

-- Use a VLOOKUP OR INDEX/MATCH formula to compare the 500 row ss
against the 1000 row ss. If you get a hit, delete the row, then paste
the two wkbks together.

OR

-- Paste the workbooks together, sort by column of your choice, then
use a procedure such as this to remove the duplicates.

Sub Deletedupes()
Dim rng As Excel.Range
Dim Col As Long
Dim X As Long
Dim r As Long
Application.ScreenUpdating = False

Set rng = ActiveSheet.UsedRange.Rows
Col = InputBox("What Column to use?" & vbCrLf & vbCrLf & "ex: for
column A, enter 1.")

X = rng.Rows.count

For r = X To 2 Step -1

If Cells(r - 1, Col) = Cells(r, Col) Then
' If Cells(r - 1, Col + 10) = Cells(r, Col + 10) Then
Cells(r, Col).EntireRow.Delete
' Else
' Cells(r - 1, Col).EntireRow.Delete
' End If
End If
Next r

Application.ScreenUpdating = True
Application.StatusBar = False
End Sub



HTH,
JP
 
Sorry about that. Check out this site for more info on how to install.

http://www.rondebruin.nl/code.htm

What it does is first it asks you for what column you want to loop
through. Then it steps through the column and if it finds two rows
with the same information (presumably, because they are duplicates),
it deletes the first one. It is necessary to step backwards through
the rows when you are deleting, to ensure that rows aren't skipped.

Keep in mind that anything the macro does is permanent -- you should
make a backup of both workbooks and be absolutely sure you are doing
the right thing before allowing it to delete your data.

HTH,
JP
 
Ah...
Then it may not work.

not sure if you're say it will delete rows where for a given column, the
data repeats.
i may have the same data in a cell in given column that is repeated in other
cells in that column but that may not mean it is a duplicated line.
It is the rows where the whole row (some 15 cells across) is duplicated that
I want to strip from the data.
Can you clarify if this is the outcome?

I'll try it tomorrow anyway and see what the effect is (Data is at work, I'm
at home).

Simon



Sorry about that. Check out this site for more info on how to install.

http://www.rondebruin.nl/code.htm

What it does is first it asks you for what column you want to loop
through. Then it steps through the column and if it finds two rows
with the same information (presumably, because they are duplicates),
it deletes the first one. It is necessary to step backwards through
the rows when you are deleting, to ensure that rows aren't skipped.

Keep in mind that anything the macro does is permanent -- you should
make a backup of both workbooks and be absolutely sure you are doing
the right thing before allowing it to delete your data.

HTH,
JP
 
The idea is that, if you have true duplicates (i.e. two rows with the
exact same information in each cell), and the worksheet is sorted, the
code would compare one cell from each row and if they are the same,
delete one. i.e. if you had a list of addresses, you could compare the
street number column.

Hopefully this will give you a visual representation of what would
happen.

http://www.cpearson.com/excel/deleting.htm


HTH,
JP
 
simon steel said:
Ok,

Suppose I have 2 worksheets.

Sheet 1 has 1000 lines of data spanning xNumber of columns.

Sheet 2 has 500 lines of data, also spanning the same number of
columns.
An unknown number of identical lines of data appear on both sheets.

How can I get to a point where I can build one worksheet that is a
composite of both sheets - such that no identical line of data appears
twice on the final sheet?


Thanks in advance, Simon.
Have you tried Excel's compare option (see Help).

Briefly, if you open both Worksheets, there will be an option under
Window menu to compare A with B.
 
Back
Top