Duplicate Cells

R

Richard Mahan

I have a rather large spreadsheet of Names, Addresses etc. and I know
there are many duplicates. I would like to know how I can find and highlight
the duplicates in a column. I don't want to remove them, only highlight
them. if the duplicates could be highlighted in a color that would be great.
Please keep in mind that I just started with spreadsheets and I'm not that
familiar with the application. thanks
 
G

Guest

Hi Richard:

Names and addresses are really quite difficult to match due to the problems
of representing the names is in:

Richard Mahan
Rick Mahan
R Mahan
RS Mahan
Richard S Mahan
Racheal Mahan
R Mahan

Which one are the same.

OK lets look at the address......

12 South Bank Road
12 South-Bank Road
12 South Bank Road,
12 S. Bank Road
12 South Bank Rd
12 South Bank Rd.
12 S. Bank Rd.
12 S Bank Rd
12 South bank road

Got the idea

OK where can we start

You can try conditional formating from the format menu:

Select the cells where the names are as in here column A cells A1:A20

select conditional formating and select formula on the left hand side
then enter the following formula in the formua area on the right.

=COUNTIF($A:$A,INDIRECT("A"&ROW()))>1

Then select format button and clcik patterns and select a color.
It should then highligh the cells that are exactly the same.
 
S

Saul

I have a rather large spreadsheet of Names, Addresses etc. and I know
there are many duplicates. I would like to know how I can find and highlight
the duplicates in a column. I don't want to remove them, only highlight
them. if the duplicates could be highlighted in a color that would be great.
Please keep in mind that I just started with spreadsheets and I'm not that
familiar with the application. thanks

I do this regularly with small databases of names and addresses (up to
around 25000 items) from commercial companies that are supposedly
clean and de-duped.

Normally I'd start by spliting names into First Name and Last name
(Text->Columns), then I'd apply a sort for the fields I'd be
interested in checking. As most of the lists are UK based I'd normally
start by matching postcodes or zipcode as the postcode information in
the UK relates to small areas. If you have the full US code you can
use this, wider area codes are more difficult.

To match postcodes I'd personally sort the data by postcode then
create an additional column with if(A2=A1,1,"") which marks any rows
which are the same as the row above. I prefer this than using
conditional formating as it allows you to delete the mark from any
rows that you are happy are genuinely different. Also at the end of
checking you can do a Copy->Paste Special->Values to preserve the
marks and sort and delete the rows you didn't want.


You can apply the same to telephone numbers, but you usually have to
do some formating first (eg removing spaces and hyphens). And to email
addresses if you have them.

If there is a lot of overlap then you can do a double sort by postcode
then last name and make the marker match both on postcode and last
name, or you can build a score that measures the level of match - eg
last name, postcode and telephone match mark them at 3, just one of
these mark them at 1 etc.

Names and addresses without the postcode or other more specific data
are more difficult and so are the part I'd do last of all. Usually I'd
sort by last name first with first name/first initial second, then
something like town, possibly with the first character of the street
name. This depends on your data as to how far you'd push it.



Saul
www.notanant.com
communities of websites
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top