Sorting data into a new column

P

pbc444

I have three large sets of data (each column has up to 50,000 rows). The
data are names (text) not numbers. I would like to create a fourth data
set comprised of the names that are present in ALL three original
columns. Basically, if a name is on list one, two, AND three, then it
is placed on the fourth list. If the name is on just one or two of the
three, then it will not be placed on the fourth list. Can I complete
this sorting in Excel automatically? The datasets are too large to
manipulate manually. Thanks.
 
J

johncassell

Hello, I'm not sure what you want to see in the fourth column if i
meets your criteria but this one will put all names into the fourt
column.

So, cell A1 contains John
B1 contains Ryan
C1 contains Smith

the formula in D1 is: =IF(AND(A1<>"",B1<>"",C1<>"")=TRUE,A1&B1&C1,"")

This will return the result JohnRyanSmith.

If only John and Ryan were there then it would return a blank value.

Hope this is what you were after

Thanks

John
 
J

johncassell

Sorry, i've just read your question again and think I've got the wrong
end of the stick.

Just need something clarifying:

If in cell A1 there was 'John' which appeared in all columns throughout
the 50,000 rows and in B1 there was 'Ryan' which also appeared in all
columns and then in C1 there was 'Smith' which appears in all columns -
What would you want to see in cell D1??

John
 
P

pbc444

Hi, thanks for the help. Here is an example that I hope will clarify.
Imagine three columns. In Column 1 there are three names (Peter, Paul,
Mary), with one name in each cell (A1, A2, A3). In Column 2, there are
four names (Paul, Mary, Luke, John) in cells B1, B2, B3, and B4,
respectively. Finally, in Column 3 there are three names (Paul, Mary,
James), in cells C1, C2, C3.

Given the above, I would like to sort the three lists into a new column
4, which would only contain that names that are present in all the other
columns. For this example, Column 4 would contain Paul and Mary in
cells D1 and D2, as those two names appear in Columsn 1-3.

Thanks again for the help. I really appreciate it.
 
D

Dave Peterson

Pick out the column with the shortest list of names (I'm gonna use A)

Then insert 4 columns next to that column (New columns B:E)

And say the other 3 columns shifted to the right to f:h.

Then in B2 (headers in row 1)
=isnumber(match(a2,f:f,0))

In C2:
=isnumber(match(a2,g:g,0))

In D2:
=isnumber(match(a2,h:h,0))

In E2:
=countif(b2:d2,true)

Drag all those formulas down the length of column A.

Let excel recalculate
(go get a cup or two of coffee with 50k rows!)

Select columns B:E
edit|copy
edit|paste special|values
(It'll make the next step quicker)

Select column E
data|filter|autofilter

Filter to show just the 3's (it's in column A and 3 appears 3 times in B, C and
D.

Select column A
edit|goto|special|visible cells only
edit|copy
and paste to a new worksheet.

That calculation could take a while. You may want to drag down in smaller
increments.
 
A

Alex

Here is another way. Assuming your data is in columnes A, B, and C and
that the first row is a column title of some kind, I got this to work.

In column D, put the following formula into D1. Drag it down to as far
as you think you will ever have data.

=IF(AND((COUNTIF(B:B,A2)+COUNTIF(C:C,A2))=2,A2<>""),A2,"")

Turn the first row titles into autofilter.

Go to the filter for column D and sleect "NonBlanks".

Voila!

Alex.
 
D

Dave Peterson

What happens if the name appears in column B twice, but 0 times in column C.

And where's the 4th column <vbg>?

=IF(AND((COUNTIF(B:B,A2)>0,COUNTIF(C:C,A2)>0,COUNTIF(d:d,A2)>0,A2<>""),A2,"")

But with giant lists (50000 rows), I've always seen =countif() much slower than
=match()

=if(and(isnumber(match(a2,b:b,0)),
isnumber(match(a2,c:c,0)),
isnumber(match(a2,d:d,0))),a2,"")

might be another approach.
 
A

Alex

I thought that excel could go to 32000+- rows anyway. I would be
looking to put this in access or something anyway.

Alex.
 
D

Dave Peterson

Excel can go 32000+. All the way to 65536 rows.
I thought that excel could go to 32000+- rows anyway. I would be
looking to put this in access or something anyway.

Alex.
 
P

pbc444

Thanks to everyone for the help. I actually ended up using an array to
extract common values from two lists, and then did an array of the
array output.

=IF(COUNTIF($A$1:$A$10,B1)>0,B1,"")

It nearly melted my laptop, but did work. I agree that data sets this
large shouldn't be in Excel.

--Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top