Remove Duplicates logic

Bob · Oct 8, 2008

I have a spreadsheet with duplicate rows. When I run the remove duplicates
task from the data tab, I get inconsistent results. For example, the
spreadsheet contains the following lines
Line 1: a 123 1
Line 2: a 123 2
Line 3: b 123 1
Line 4 : b 123 2

I check the columns to compare the first and second columns to determine
duplicates, then run the task to remove duplicates

The result is

New Line 1: a 123 1
New Line 2: b 123 2

What logic is being used to select the first duplicate of a pair to be saved
(line 1) and then to select the second duplicate of a pair (line 4)?

The actual data that I am using is financial data extracted to a spreadsheet
and then merged into my master sheet. I want to be able to save the latest
version of the data (the last column is used as the identifier), and remove
the previous version of the data (eg. in the example, save the values with
the 2 in the last column and eliminate the duplicates which happen to contain
the 1 in the previous column). In the actual data the column with the
version information is actually a date.

Bob · Oct 8, 2008

Thanks, the process that you outline and the logic you describe is how I
thought that it would work. Unfortunately that is not the result that I am
getting. The first of each duplicated row is not necessarily the row
maintained. I have sorted the data in date order, but have not used the date
field in my duplicate identification criteria.

Any other suggestions?

~L · Oct 8, 2008

Include the date in your duplicate identification criteria selection, but
uncheck the box for the date column when the menu comes up. The duplicate
finder will evaluate every checked item against the other items within the
column. With the column sorted by date newest to oldest,, only older
duplicates will be eliminated.

Unless...

If date isn't included, the a/b 123 data will be deleted and shifted but the
dates will not (and your data will be incorrect because the row alignment has
changed). If teh date column is included but not unchecked, nothing will be
deleted because the different dates make the rows not duplicates.

If it still isn't working, check to make sure a doesn't equal b (for
example, if a cell displays 1 but is formatted a certain way, it might
actually be equal to 1.234). Extra spaces are often the culprit when
comparing text (" Hi there" "Hi there " and Hi there" are not equal to "Hi
there").

In the case of missing rows, the duplicate finder can eliminate items so
that if a row is hidden, an item can shift from a non-hidden row to a hidden
one. If a hidden row falls within the selected range, it will be evaluated.

And if none of that works, I must not be clear on the problem.

Duplicates-to id when occurs in 3 columns for unique dup in 4th co	1	Jul 15, 2009
Duplicates and unique values	1	Feb 3, 2009
Unique data	2	Sep 30, 2009
not exact duplicates...	3	Jan 19, 2009
Finding duplicates	2	Feb 10, 2010
Augmenting an integer in a column to reflect duplicates in another	23	Jun 9, 2014
formula required	7	Dec 24, 2008
Excel Comparing two columns	1	Nov 8, 2019

Remove Duplicates logic

Bob

Bob

~L

Ask a Question

Similar Threads