Merging two spreadsheets, avoiding duplicates?

G

George K

I have two excel spreadsheets that each contains the following fields:
Company name, Contact name, email address.

The first is 1600 lines long, and the second about 6000 lines long. All 1600
entries of the first one exist in the second spreadsheet.

My company emailed the names of the first list but afterwards we decided we
had to email more people and exported all our contacts to make the second
6000-name long list. Now we have to make sure we don't email the same people
twice. So, is there a way to subtract the entries that exist in the first
one from the second? Could we use the Email column as the criterion to judge
if the name exists or not?

Thank you. George
 
G

George K

Hi Gord,

Is there any other way? I tried to do it and it does reduce the rows meaning
it filters out data. However, i don't understand what criterion it uses to
filter out data. Also, whether i clicked "unique " or not, again data were
filtered. i don't understand how it works. I would like to use the email
uniqueness as the criterion for duplication. Because i know we only have 1
email per name. Not any other column.

I need to be 1000% sure about this. Sending an email twice even to few of
the 1600 common top financial names would be a good reason to "not be
needed" in the company anymore...

Could you please help me in more detail?

Thanks,
George
 
G

Gord Dibben

George

Did you look at Debra's instructions on this?

I can't explain any better than she has. Did you check the "copy to another
location" and "unique records only"?

I am assuming you copied the 1600 rows to below the 6000 rows giving a total
of 7600 rows.

Also assuming all the data is contiguous(no blank rows or columns) follow
Debra's instructions. The "copy to" cell must be outside the data range.
Make it A7605 which is outside the filtered range.

Or use Debra's instructions for copying to another sheet.

As far as "any other way", you could visit Chip Pearson's site and see how to
"tag" the duplicates and then use the provided code to replace the extras with
a line then filter out those extras and delete them.

Gord
 
D

Dave Hawley

Hi George

Place your complete list into Column "A". Now push Alt+F11 and then go
to Insert>Module and paste in the code below

Sub RemoveDupes()

'Add extra Column, "A" becomes "B"
Columns(1).EntireColumn.Insert

'Filter out duplicates and copy unique list to "A"
Range("B1", Range("B65536").End(xlUp)).AdvancedFilter _
Action:=xlFilterCopy, CopyToRange:=Range("A1"), Unique:=True

'Add extra Column, "B" becomes "A"
Columns(2).EntireColumn.Delete

End Sub

Now click the top right X to get back to Excel. Push Alt+F8, click
"RemoveDupes" and then Run.

Code taken from:
http://www.ozgrid.com/VBA/RemoveDuplicates.htm

***** Posted via: http://www.ozgrid.com
Excel Templates, Training & Add-ins.
Free Excel Forum http://www.ozgrid.com/forum *****
 
G

George K

Thanks guys.

Gord, i 'll try what you said on Monday. Thanks.

However, what is still unclear in my head is the criteria for duplication.
What does it assume as a duplicate?

Example:

COMPANY NAME EMAIL
Ribas John Smith (e-mail address removed)
Ribas Helen Brown (e-mail address removed)
Citibank John Smith (e-mail address removed)
etc

If you take a look, the company name repeats as we have more than one name
for each company, and some names may coinside to be the same. Like "John
Smith". Easy in 6000 names. So, the email column is the proper criterion for
uniqueness.

What i understood from Dave Hawley's reply is that i would have to keep the
one column of interest, the EMAIL in my case, and search for duplicates in
there. If i do that, how would i know whose is each email that remained
after the process. I would have to add the corresponding name and company
name next to each email.

I need to have them together and for every duplicate email, delete the whole
corresponding row.

Does that make sense?

Thanks again,

George
 
G

Gord Dibben

George

Given the data in your example and the fact you want to send one and only one
email to each company, I would filter or tag on column A. This column has the
duplicated company names.

You would wind up with a list of unique company names, with one contact and
email for each company.

If you want a particular person(column B) as a contact for each company, I am
afraid you would be forced to manually delete the extra persons. Sort on
Column A and pick the persons per company you don't want and delete those
rows.

The email column C has no duplicates to filter or tag.

Gord
 
G

George K

Hi Gord,

I'm afraid that what i'm looking for is not what you just described. we 've
taken a different direction.

We want a few names per company, but we want to email each name in each
company only once.

The problem is that we cannot achieve uniqueness by keeping each name only
once. Because as in example 1 there could be two John Smiths, in different
companies, which is ok. It happens.

Example 1:
COMPANY NAME EMAIL
Ribas John Smith (e-mail address removed)
Ribas Helen Brown (e-mail address removed)
Citibank John Smith (e-mail address removed)

All of the above are acceptable. What is not acceptable is the following:


COMPANY NAME EMAIL
Ribas John Smith (e-mail address removed) <--
Ribas Helen Brown (e-mail address removed)
Citibank John Smith (e-mail address removed)
Ribas John Smith (e-mail address removed) <--

See how Ribas - John Smith - (e-mail address removed) exists twice? it's the same
person, in the same company and his email exists twice. These entries are
what i want to get rid of.
Judge by email, and if email exists twice, delete the whole line. Having two
Ribas or two John Smiths is fine. Having two (e-mail address removed) is not ok.

Thank you,

George
 
D

Dave Peterson

How about something close to what you've tried.

Insert a new worksheet.
Back to your data worksheet with the 3 columns.
Select the 3rd column
data|filter|autofilter
do the unique values only, but filter in place--don't copy it elsewhere.

Now select the visible cells of all 3 columns and copy all of that.

Only the visible cells will be copied and when you paste, you'll end up with
just a single copy of each email address.

But you'll be stuck with whichever one excel's data|filter|advanced filter
wanted to keep.

(Simple testing indicates that excel shows the first value in the range. Sort
your data in whatever order if you need to keep a different record.)
 
G

Gord Dibben

George

I have taken the direction that your first sample with no duplicate emails led
me.

You have now posted a different sample which does have duplicate emails.
COMPANY NAME EMAIL
Ribas John Smith (e-mail address removed) <--
Ribas Helen Brown (e-mail address removed)
Citibank John Smith (e-mail address removed)
Ribas John Smith (e-mail address removed) <--

I think you have finally given us a sample we can work with(if this is a real
sample).

Filter on email column uniques and only one (e-mail address removed) will remain.

Dave Peterson has given some direction in his recent post.

Gord
 
G

George K

Thanks guys,

I'll try that tomorrow at work with our real files. It has to work!

Thanks again,
George
 
Y

yvandee

George, I believe Alan's post from another thread aptly answers you
question. First of all, copy the list of 1600 and append it to the en
of the longer list so you have 7600 records. Then apply Alan’
formula:

"
Reply: MULTIPLE DUPLICATE COLUMNS - jonny from lONDON wrote on July 8
2003 5:26 PM EST
Alan wrote on August 07, 2003 16:24 EST
Hi jonny,

If A1:A99 contains a list of items, some of which are duplicates, the
enter the following formula in B1 and copy down to the end of the list


=COUNTIF(A$1:$A99,A1)

This will return a value that tells you how many times the entry i
column A appears in the list.
"

In your case, with the sample you provide, and assuming the list i
7600 records long, the formula, entered in cell D2, would b
=COUNTIF(C$2:$C7601,C2)

Copy the formula down to the end of the list.

This will return 1 for all email addresses that only appear once in yo
list. You then auto filter for 1 in column D and this will return
list of addresses that appear only once (therefore have not been maile
to)..

Hope this helps..
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top