Remove duplicate rows in Excel

J

john.hood

This is an MS problem for which there is a freeware fix. I worked on
it for three hours before I found the solution below that works both
flexibly and reliably.

How do I remove duplicate records in Excel?

The instructions from Microsoft
http://support.microsoft.com/default.aspx?scid=kb;en-us;Q262277 and the
macro code that is being passed around,
http://www.ozgrid.com/VBA/RemoveDuplicates.htm will only remove
duplicates based on one column.

Instead:
Save the XLS as Type=CSV
Go to http://home.hccnet.nl/s.j.francke/t2t/text2table.htm and download
and install CSVed.
Open CSVed.
Open the CSV file you made.
Go to the Misc, tab in CSVed.
The middle section is "Remove Duplicates."
Put a checkbox next to the column(s) you want scanned for duplicate
entries and click "Remove Duplicates". The rest is done
automatically.

Select the "Export" tab and export to Excel. You have a de-dupped
list.

CSVEd also allows you to add prefix and suffix at all entries in a
column, merge columns, split columns at position, or at character.

This is going on the Best of site soonest.

John H.

John's Best of Free Software
www.jhoodsoft.org
 
S

Susan Bugher

This is an MS problem for which there is a freeware fix. I worked on
it for three hours before I found the solution below that works both
flexibly and reliably.

How do I remove duplicate records in Excel?

The instructions from Microsoft
http://support.microsoft.com/default.aspx?scid=kb;en-us;Q262277 and the
macro code that is being passed around,
http://www.ozgrid.com/VBA/RemoveDuplicates.htm will only remove
duplicates based on one column.

Instead:
Save the XLS as Type=CSV
Go to http://home.hccnet.nl/s.j.francke/t2t/text2table.htm and download
and install CSVed.
Open CSVed.
Open the CSV file you made.
Go to the Misc, tab in CSVed.
The middle section is "Remove Duplicates."
Put a checkbox next to the column(s) you want scanned for duplicate
entries and click "Remove Duplicates". The rest is done
automatically.

Select the "Export" tab and export to Excel. You have a de-dupped
list.

CSVEd also allows you to add prefix and suffix at all entries in a
column, merge columns, split columns at position, or at character.

This is going on the Best of site soonest.

Thanks for the info/review of CSVEd. It's one of the programs I keep meaning to learn. . . any
minute now. . .

<OT>
re the Microsoft instructions:

http://support.microsoft.com/default.aspx?scid=kb;en-us;Q262277

"6. In the Criteria range box, type or select the same range of records (that is, A3:H10)"

The only criteria you want to specify is "unique" and that's done in a checkbox. The above
instruction looks odd to me (I use QuatroPro ($ware) which has a similar procedure). I don't have
Excel so can't try this myself. Might be worth a quick check to see if leaving the "criteria range
box" blank and/or specifying "any record" as the criteria (using blank/wildcard/cidilla?) gives you
the proper result.
</OT>

Susan
--
Posted to alt.comp.freeware
Search alt.comp.freeware (or read it online):
http://groups.google.no/groups?q=+group:alt.comp.freeware&hl=en
Pricelessware & ACF: http://www.pricelesswarehome.org
Pricelessware: http://www.pricelessware.org (not maintained)
 
J

John Fitzsimons

This is an MS problem for which there is a freeware fix. I worked on
it for three hours before I found the solution below that works both
flexibly and reliably.
How do I remove duplicate records in Excel?

< snip >

What I want from time to time is to know how to remove records that
are duplicated in Excel. In other words how do I only see unique
records ?

What I do now is export to text and have Ted notepad only display
unique lines (after sorting if needed).

Would be nice to do in Excel though.

Regards, John.
--
****************************************************
,-._|\ (A.C.F FAQ) http://clients.net2000.com.au/~johnf/faq.html
/ Oz \ John Fitzsimons - Melbourne, Australia.
\_,--.x/ http://www.vicnet.net.au/~johnf/welcome.htm
v http://clients.net2000.com.au/~johnf/
 
J

John Fitzsimons

On Tue, 28 Jun 2005 16:01:17 GMT, Susan Bugher

"6. In the Criteria range box, type or select the same range of records (that is, A3:H10)"
The only criteria you want to specify is "unique" and that's done in a checkbox.

< snip >

Yes, that de-dups in Excel without needing to export anything. Now, as
per my other post. If only I could use Excel to remove duplicated rows
altogether. Leaving behind only records that are not duplicated in the
original. :-(

Regards, John.
 
G

GlowingBlueMist

John Fitzsimons said:
On Tue, 28 Jun 2005 16:01:17 GMT, Susan Bugher





< snip >

Yes, that de-dups in Excel without needing to export anything. Now, as
per my other post. If only I could use Excel to remove duplicated rows
altogether. Leaving behind only records that are not duplicated in the
original. :-(

Regards, John.
Hi John,
If you know how to get the data into Excel and sorted so that all the
duplicate rows are adjoining each other removing them is only a couple of
more steps.

Lets say you sorted all of the data so that the duplicates are in column B,
as in the IP names, starting at the top of the page in cell B1. The rest of
the IP names go down from there with no blank entries.

If the column C is empty here is a formula you will enter into the empty
cell C1.
=IF(B1=B2,"Dupe","No Dupe")

Then click on any other empty cell.

Come back to C1 and highlight that cell. This should place a black box
around the cell along with a small little square in the lower right corner
of cell C1.

Double click on that little square at the bottom right corner of C1 and the
formula should propagate down through C2, C3, etc, until a blank cell shows
up in column B.

Now highlight all of column C

Do a right click with your mouse inside the highlighted column C and click
on Copy.

Now with column C still highlighted, Right click with your mouse in the
highlighted column C and choose Past Special. When the Past special window
opens choose the Paste "Values" button. This will remove the formula from
all of column C but leave the Dupe or No Dupe message in it.

Now sort the entire spreadsheet using column C and all the Dupe or No Dupes
will clump together. Highlight all of the Dupe rows and delete them.

Then delete the bogus column C you created in the first place.

Finally so one last sort to get the data in the format you want it in and
you are just about done.

I know these instructions are a little sparse but if you play around a
little with dummy data you should get the hang of it.

Finally you save the spreadsheet using TAB delimited format, not Excel's.
Then rename the file so it is just called hosts and not hosts.txt or some
such and you should be done.

Good luck.
 
D

Diablo

Use the "autosort" function and use custom parameters, or try ASAP
Utilities...v3.10 was just released today
http://www.asap-utilities.com/index.php?page=p_dl.php?
| On 28 Jun 2005 06:34:04 -0700, (e-mail address removed) wrote:
|
| >This is an MS problem for which there is a freeware fix. I worked on
| >it for three hours before I found the solution below that works both
| >flexibly and reliably.
|
| >How do I remove duplicate records in Excel?
|
| < snip >
|
| What I want from time to time is to know how to remove records that
| are duplicated in Excel. In other words how do I only see unique
| records ?
|
| What I do now is export to text and have Ted notepad only display
| unique lines (after sorting if needed).
|
| Would be nice to do in Excel though.
|
| Regards, John.
| --
| ****************************************************
| ,-._|\ (A.C.F FAQ) http://clients.net2000.com.au/~johnf/faq.html
| / Oz \ John Fitzsimons - Melbourne, Australia.
| \_,--.x/ http://www.vicnet.net.au/~johnf/welcome.htm
| v http://clients.net2000.com.au/~johnf/
 
J

John Hood

John said:
< snip >

What I want from time to time is to know how to remove records that
are duplicated in Excel. In other words how do I only see unique
records ?

What I do now is export to text and have Ted notepad only display
unique lines (after sorting if needed).

Would be nice to do in Excel though.

Regards, John.
John:
You can do it in Excel if you want to:

From: http://support.microsoft.com/default.aspx?scid=kb;en-us;Q262277

To create a compiled list of unique records, follow these steps:
1. Select the column titles above rows or records that you want to sort.
2. On the Data menu, point to Filter, and then click Advanced Filter.
If you are prompted as follows, click OK.
No headers detected. Assume top row of selection is header row?
3. Under Action, click Copy to another location.
4. Click to select the Unique records only check box.
5. In the List range box, type or select the range of records (such
as A3:H10).
6. In the Criteria range box, type or select the same range of
records (that is, A3:H10)
7. In the Copy to box, type the address of or select the first cell
where you want the information to be placed. Make sure there is enough
room so you don't overwrite pre-existing data. NOTE: You can only do
this in the active sheet.
8. Click OK.

Me again -
The problem is, you end up with dup list and a dedupped list, and you
have to delete the original. You have to have Excel recognize the first
rows as header, and you can only do it reliably on one criteria cell.
It works, badly. This was the result of 1 half day of testing against
live data (25000 records). For example it removed all of the Smiths
except one, because we couldn't specify more than one criteria cell, i.e.
Smith, John, Ohio;
Smith, Aaron, Minnesota,
Smith, Shelly, Wisconsin <--this is the only one that stayed.
Smith, John, Ohio;
Smith, Aaron, Minnesota,

CSVed gives you much more control for this and is much easier and
faster. We sorted by Last, First, and State, and got:
Smith, John, Ohio;
Smith, Aaron, Minnesota,
Smith, Shelly, Wisconsin

Just like we wanted. and CSVed did it in less than 15 seconds.

John H.
 
S

Susan Bugher

John said:
On Tue, 28 Jun 2005 16:01:17 GMT, Susan Bugher




< snip >

Yes, that de-dups in Excel without needing to export anything. Now, as
per my other post. If only I could use Excel to remove duplicated rows
altogether. Leaving behind only records that are not duplicated in the
original. :-(

Regards, John.

Create a table that doesn't have the duplicate entries. Use that as your new table and delete the
original table. Not quite what you asked for but the end result is the same.

Susan
--
Posted to alt.comp.freeware
Search alt.comp.freeware (or read it online):
http://groups.google.no/groups?q=+group:alt.comp.freeware&hl=en
Pricelessware & ACF: http://www.pricelesswarehome.org
Pricelessware: http://www.pricelessware.org (not maintained)
 
S

Simon

In our last episode GlowingBlueMist wrote:

<snipped for brevity>

this is the process I've used for a while now, and it works fine, although it
can be a little involved...

Down side to this is where there are multiple columns that need to be
checked (as in the example quoted), although you may be able to use nested
if's in your formula - depends on your data.

Another solution would be to concatenate all the columns then filter on that,
ie in the first empty column after your data add "=A1&B1&C1" etc and you'll
end up with a single string containing ALL the text from each cell in the row
(I would limit this to just the columns you want to filter on). Then resort
the data based on the column you created, and add the formula
"=if(d2=d1,"DUPE","")". Apply an autofilter and filter on nonblanks (or
"DUPE"). Select the lot and delete (it is not necessary to copy and paste
the values). You should now have a 'clean'list...

if you use VBA there are other ways...

S
 
C

Chatterley

John Fitzsimons said:
< snip >

Would be handy if it worked. But it doesn't. It just de-duped. I
wanted duplicates removed so that I was only left with unique records.

With something like ;

Tom
Tom
Dick
Harry

I end up with ;

Tom
Dick
Harry

Tom wasn't removed.

Regards, John.



You could concatenate all columns of every row, and then do the Data Filter
trick.
Not ideal though.

Also thought that you could use Pivot Table to do a count of data that may
be duplicated, and then
sort the result to find all non-1 counts. Then delete by hand. Also messy.

rgds
 
J

John Fitzsimons

John Fitzsimons wrote:
Create a table that doesn't have the duplicate entries.

< snip >

That's what I want to do. The above simply de-dups. It doesn't remove
the original, and copies, of duplicates i.e. only show unique entries.

Regards, John.
 
J

JL Hotz

John said:
< snip >

Would be handy if it worked. But it doesn't. It just de-duped. I
wanted duplicates removed so that I was only left with unique records.

With something like ;

Tom
Tom
Dick
Harry

I end up with ;

Tom
Dick
Harry

Tom wasn't removed.

Regards, John.

What I have done in the past in Excel is, first sort records, then
create a formula in another column something like (assuming your
records are in column A) "IF (A2=A1,999,0)". Copy & paste this formula
all the way down to the end of your records. A2=A1 will automatically
change to A3=A2, A4=A3, etc., when you paste. If I remember correctly
that formula will show the number '999' (or any other number or text
you choose) in the column with the formula wherever a record is
identical to the one above it. If it is not the same as the one above a
'0' (or whatever you choose) will show in the column with the formula.
Remember the data must have been sorted first to get the duplicates
next to each other. Next select the column with the formulas (999's &
0's), copy it and then paste-special "formulas as values" back to the
same cells (or even in a different column if you want). If pasting over
the same cells, the formulas are replaced by the values (999 or 0). Now
sort all data by the column with the values (not formulas) and the
999's will fall to the end. Select all the 999 rows and delete them.
You now have removed all duplicates. You can then delete the
formula/number column and sort the remaining records as you wish. Of
course it's best to work with a copy of the file in case of problems.

regards, Jerry
 
S

Susan Bugher

John said:
< snip >

That's what I want to do. The above simply de-dups. It doesn't remove
the original, and copies, of duplicates i.e. only show unique entries.

Hello John,

I don't know if we're failing to communicate or if QuattroPro and Excel treat this particular
operation quite differently. I'm still proceeding on the assumption that they have similar
capabilities. In QuattroPro creating a "no duplicates" table is a simple operation - an example:

1. specify the source table (Software:A1..AS3817)
2. specify a criteria table.

You must *specify* a criteria table. In this case I don't want to specify a criteria and I don't
have to. As a minimum input I can use any real field name and leave the criteria cell in the table
blank. I can even specify a *non-existent* field name (column heading) - then the table looks like this:

SomePage cell A1: ZZZZZZZ (as the field name)
SomePage cell A2: blank cell (as the criteria)

3. specify the output table => copy the first row of the source table, paste it on another
page/sheet and then specify the area below: SomeOtherPage:A1..AS3817 (or SomeOtherPage:A..AS)

4. click on "extract unique"

The output table will contain one instance of each unique record - IOW - a copy of the original
table *without* duplicate entries. ISTM you should be able to do this in Excel too. . .

a few more comments. . .

I use QuattroPro's "unique" extraction option to create the Category Index pages for the PL and the
ACF list:

http://www.pricelesswarehome.org/2005/PL2005CategoryIndex.php
http://www.pricelesswarehome.org/acf/P_CategoryIndex.php

For those pages I want each unique Category/Subcategory name. IOW - I only want QuattroPro to
compare a few of the columns.

QuattroPro compares *OUTPUT* fields to determine whether or not a record is "unique". I limit the
output to just the columns I'm interested in and that returns the "unique" results for that set of
columns.

I hope you can "translate" the QuattoPro procedures into equivalent Excel procedures. If not I guess
we'll just have to agree that Excel is an inferior product. ;)

Susan
--
Posted to alt.comp.freeware
Search alt.comp.freeware (or read it online):
http://groups.google.no/groups?q=+group:alt.comp.freeware&hl=en
Pricelessware & ACF: http://www.pricelesswarehome.org
Pricelessware: http://www.pricelessware.org (not maintained)
 
A

Adrian Carter

.... snip
Would be handy if it worked. But it doesn't. It just de-duped. I
wanted duplicates removed so that I was only left with unique records.

With something like ;

Tom
Tom
Dick
Harry

I end up with ;

Tom
Dick
Harry

Tom wasn't removed.

Regards, John.

Hi John,

I have a tray utility that removes duplicates from a column of text in the
clipboard.
At present, it would leave Tom in your example, but it would be a snap to
create
a version for what you want to do - if you want it. About 15-30 min,
depending
on responses to the following:
1. Are comparisons case-sensitive? ie are Tom and TOM dups or different?
2. Would you want the original order of rows to be restored (after removing
dups)?
3. When comparing rows, do you consider the whole row, or just a key field?

Regards,
Adrian
 
J

JL Hotz

John said:
Now, as >> per my other post. If only I could use Excel to remove
duplicated rows >> altogether. Leaving behind only records that are
not duplicated in the >> original. :-(


< snip >

That's what I want to do. The above simply de-dups. It doesn't remove
the original, and copies, of duplicates i.e. only show unique entries.

Regards, John.

If I understand correctly that you want to also remove the original of
any duplicated records. I'm not sure why, but .....

What I have done in the past in Excel is, first sort records so that
all duplicates are together, then (assuming your records are in column
A), create a formula in the 2nd row of column B something like "IF
(A2=A1,100,0)". Copy & paste this formula all the way down to the end
of your records. A2=A1 will automatically change to A3=A2, A4=A3, etc.,
when you paste. That formula should show the number '100' in the column
with the formula wherever a record is identical to the one above it. If
it is not the same as the one above a '0' will show in the column with
the formula. Remember the data must have been sorted first to get the
duplicates next to each other. This step checks the record imediately
above to see if it is the same.

Next step is to do the same check for the record below. In column C,
put a formula starting on the 1st row "IF (A1=A2,100,0), copy and paste
that formula down to the end of your data.

Then in column D, sum the values of column B & C. The sum should be
either 0 or 100 or 200.

Now copy column C and "paste-special formulas as values into column D.
This gets rid of the underlying formulas that would otherwise produce
different results after a re-sort. Only the values (0, 100, 200) remain.

A B C D
--- --- --- ---
Tom 100 100
Tom 100 100 200
Tom 100 0 100
Dick 0 0 0
Harry 0 0 0

Then sort all data on column D. The 0's will be first, then the 100's
and finally the 200's. Delete all rows with 100's and 200's. The
duplicates and the original of any duplicated record will be gone. In
the above example the 3 "Tom's" would fall to the end of the list
because, sorting on column D the values for "Tom" are either 100 or
200. Easy to select everything higher than zero and delete.

Clean up by deleting the columns with formulas and values and re-sort
the remaining records as you like.

If you have never done something like this before, be advised to work
with a copy of the file so you can restore in case there are problems.

Hope this makes sense.

Jerry.
 
J

JL Hotz

Let me make an important correction to my earlier post......

JL said:
John Fitzsimons wrote:


If I understand correctly that you want to also remove the original of
any duplicated records. I'm not sure why, but .....

What I have done in the past in Excel is, first sort records so that
all duplicates are together, then (assuming your records are in column
A), create a formula in the 2nd row of column B something like "IF
(A2=A1,100,0)". Copy & paste this formula all the way down to the end
of your records. A2=A1 will automatically change to A3=A2, A4=A3,
etc., when you paste. That formula should show the number '100' in
the column with the formula wherever a record is identical to the one
above it. If it is not the same as the one above a '0' will show in
the column with the formula. Remember the data must have been sorted
first to get the duplicates next to each other. This step checks the
record imediately above to see if it is the same.

Next step is to do the same check for the record below. In column C,
put a formula starting on the 1st row "IF (A1=A2,100,0), copy and
paste that formula down to the end of your data.

Then in column D, sum the values of column B & C. The sum should be
either 0 or 100 or 200.

Now copy column C and "paste-special formulas as values into column D.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

the above should have said:

Copy column D and paste-special "formulas as values" back over itself.
(another option is to paste the values into another column such as E)
This gets rid of the underlying formulas that would otherwise produce
different results after a re-sort. Only the values (0, 100, 200)
remain.

A B C D
--- --- --- ---
Tom 100 100
Tom 100 100 200
Tom 100 0 100
Dick 0 0 0
Harry 0 0 0

Then sort all data on column D. The 0's will be first, then the 100's
and finally the 200's. Delete all rows with 100's and 200's. The
duplicates and the original of any duplicated record will be gone. In
the above example the 3 "Tom's" would fall to the end of the list
because, sorting on column D the values for "Tom" are either 100 or
200. Easy to select everything higher than zero and delete.

Clean up by deleting the columns with formulas and values and re-sort
the remaining records as you like.

If you have never done something like this before, be advised to work
with a copy of the file so you can restore in case there are problems.

Hope this makes sense.

Jerry.

Sorry for the error, It's been a long hot week.
 
J

John Fitzsimons

I have a tray utility that removes duplicates from a column of text in the
clipboard.
At present, it would leave Tom in your example, but it would be a snap to
create
a version for what you want to do - if you want it. About 15-30 min,
depending on responses to the following:
1. Are comparisons case-sensitive? ie are Tom and TOM dups or different?
2. Would you want the original order of rows to be restored (after removing
dups)?
3. When comparing rows, do you consider the whole row, or just a key field?

Hi Adrian,

Thank you for your offer. I was however ideally wanting to do this
in Excel, or similar. If I want unique lines in a text file I just use
the excellent TED Notepad. From ;

tom
tom
dick
harry
harry
harry

I end up with

dick

only.

The reason I want a quick/easy way to get rid of dupes in spreadsheets
is because I can have a booklist for March and a booklist for April.

If I could remove everything that is in both spreadsheets then I am
closer to finding out which are the "new" titles.

What Jerry had to say seems to be getting me closer to what I want.
Here is EXACTLY what I would like ;

Booklist for March :

Book 1
Book 2
Book 3

Booklist for April:

Book 3
Book 4
Book 5

Result : A spreadsheet/list/file giving me discontinued books, and
new books

Eg. Discontinued books Book 1, Book 2.
New book additions Book 4, Book 5

As you can see, I don't want to see duplicates (eg. 3), books that
are listed in both month's lists.

Regards, John.
 
J

John Fitzsimons

John Fitzsimons wrote:

4. click on "extract unique"
The output table will contain one instance of each unique record - IOW - a copy of the original
table *without* duplicate entries. ISTM you should be able to do this in Excel too. . .

< snip >

A table without duplicate entries is simply the table after de-duping.
As I said, I don't want to de-dupe. I want unique entries only.

Regards, John.
 
J

John Fitzsimons

Hi Jerry,

See my booklist post in this thread.

Worked perfectly. :)

Neat way to see duplicated records.

Okay. Works fine.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
the above should have said:
Copy column D and paste-special "formulas as values" back over itself.
(another option is to paste the values into another column such as E)

I much prefer the latter. How would I do that please ? I want to avoid
copying and paste special. Too time consuming.

< snip >

Regards, John.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top