Duplicates in Rows

mark · Aug 6, 2006

Hi All,
I've been given at work a file containing a table with over 4,000 rows
and data in 8 columns.

What I have to do is compare each cell in each of the 8 columns per row
with the very next cell below it on the next row.
If all of the 8 cells below are exactly the same as the cells above
then delete that row and then the comparing again with the now new next
row and so on and so on and gradually go through all the 4,000+ rows
It's taking me ages as I'm just eyeballiing as I couldn't see how auto
filter could help or how I could do it with code.
A sample is below
Any help is greatly appreciated as I gave up late on Friday after hours
of doing it manually

Mark

Class Fname Lname Regn Facility other columns.......
ss Charles Pott 1346 Else
vm Charles Pott 1346 Some
vm Charles Pott 1346 Some - this to be deleted
ss Pete Blogg 7890 Else
ss Pete Blogg 7890 Else - this to be deleted
hm PC Blogg 7890 Else
hm Pete Blogg 7890 Some
hm PC Blogg 7890 Some - this NOT to be deleted

Lars-Åke Aspelin · Aug 6, 2006

Hi All,
I've been given at work a file containing a table with over 4,000 rows
and data in 8 columns.

What I have to do is compare each cell in each of the 8 columns per row
with the very next cell below it on the next row.
If all of the 8 cells below are exactly the same as the cells above
then delete that row and then the comparing again with the now new next
row and so on and so on and gradually go through all the 4,000+ rows
It's taking me ages as I'm just eyeballiing as I couldn't see how auto
filter could help or how I could do it with code.
A sample is below
Any help is greatly appreciated as I gave up late on Friday after hours
of doing it manually

Mark

Class Fname Lname Regn Facility other columns.......
ss Charles Pott 1346 Else
vm Charles Pott 1346 Some
vm Charles Pott 1346 Some - this to be deleted
ss Pete Blogg 7890 Else
ss Pete Blogg 7890 Else - this to be deleted
hm PC Blogg 7890 Else
hm Pete Blogg 7890 Some
hm PC Blogg 7890 Some - this NOT to be deleted

Hi mark,

If this is something that should be done frequently I suggest that you
use a macro to automate the task. If it is a task to be done just once
you can do as follows, assuming your data are in columns A through H
and rows 1 through 4000 and that you can use the rest of the sheet as
working space:

In cell I1 enter the formula =ROW()
In cell J1 through Q1 just copy the data from A1 through H1.
In cell J2 enter the following formula
=IF(AND($A2=$A1;$B2=$B1 ...and so on... ;$H2=$H1);"";A2)
copy this forumula to K2 through Q2 andf then copy I2 through Q2 down
to I4000 through Q4000.
All unique rows should now be visible in columns J through Q and the
duplicates should be blanked.

Now select cells I1 through Q4000 and copy it (Edit->Copy)
Select cell R1 and paste the data using Edit->Paste Special->Values
(columns R through Y will be filled with your data)

Finally use Data->Sort (sort columns R through Y by column R) to get
your blank rows at the bottom and your unique data packed at the top.
(The R column now secures that the original row order is preserved)

Now you can delete columns A through R and save the result.

Hope this helps. / Lars-Åke

Leo Heuser · Aug 6, 2006

mark said:
Hi All,
I've been given at work a file containing a table with over 4,000 rows
and data in 8 columns.

What I have to do is compare each cell in each of the 8 columns per row
with the very next cell below it on the next row.
If all of the 8 cells below are exactly the same as the cells above
then delete that row and then the comparing again with the now new next
row and so on and so on and gradually go through all the 4,000+ rows
It's taking me ages as I'm just eyeballiing as I couldn't see how auto
filter could help or how I could do it with code.
A sample is below
Any help is greatly appreciated as I gave up late on Friday after hours
of doing it manually

Mark

Class Fname Lname Regn Facility other columns.......
ss Charles Pott 1346 Else
vm Charles Pott 1346 Some
vm Charles Pott 1346 Some - this to be deleted
ss Pete Blogg 7890 Else
ss Pete Blogg 7890 Else - this to be deleted
hm PC Blogg 7890 Else
hm Pete Blogg 7890 Some
hm PC Blogg 7890 Some - this NOT to be deleted

Hi Mark

Assuming table in columns A:H.

One way to do it:

1. Select a random cell in the table.
2. Choose Data > Filter > Advanced filter
3. Checkmark "Uniques only" (or similar). OK.
4. Select columns A:H and copy.
5. Select e.g. K1 (assuming columns K:R are empty)
and press <Enter>
6. Only the visible cells are pasted to the range.
7. Chose Data > Filter > Show all

In point 5 you might as well chose a cell in another sheet.

Lars-Åke Aspelin · Aug 6, 2006

Hi Mark

Assuming table in columns A:H.

One way to do it:

1. Select a random cell in the table.
2. Choose Data > Filter > Advanced filter
3. Checkmark "Uniques only" (or similar). OK.
4. Select columns A:H and copy.
5. Select e.g. K1 (assuming columns K:R are empty)
and press <Enter>
6. Only the visible cells are pasted to the range.
7. Chose Data > Filter > Show all

In point 5 you might as well chose a cell in another sheet.

Well, that was certainly a less complicated solution than the one I
posted and it probably does what the OP really needs.

However, the Advanced Filter solution also filters out records that
are not adjacent which might not be the intention. The problem
description is not fully clear in this respect if you ask me.

"What I have to do is compare each cell in each of the 8 columns per
row with the very next cell below it on the next row. If all of the 8
cells below are exactly the same as the cells above then delete that
row and then the comparing again with the now new next row and so on
and so on and gradually go through all the 4,000+ rows"

You have to guess a little whether the task is to remove duplicates
wherever they are or just adjacent duplicates and the problem with
this problem description is that is focuses more on how to accomplish
the result (the algorithm) than describing exactly what to achieve,
i.e. the expected result.
The "and so on and so on" part of the algorithm is where you have to
guess.

One interpretation (that is supported by my solution) is that the
following set of data

some data
some data
something else
something else
some data
something else

should result in the following:

some data
something else
some data
something else

Another interpreation (that is supported by Leos solution) is that it
should result in the following:

some data
something else

Examples can often be helpful, but the example given by mark gives no
hint what to do with non adjacent duplicates (or a statment that such
never exists).
If there would have been one more record at the very end with

hm PC Blogg 7890 Else

together with a statement on whether that record should be kept or not
we wouldn't have to guess.

Lars-Åke

mark · Aug 7, 2006

Lars, your first response is spot on and thank you very, very much.
You are correct in that I did not want non-adjacent duplicates deleted,
I should have made it clearer as you suspected.
Once again thank you,
Mark

Duplicates in Rows

mark

Lars-Åke Aspelin

Leo Heuser

Lars-Åke Aspelin

mark