How do I create a list of unique values from 2 columns of data

  • Thread starter Thread starter FrozenRope
  • Start date Start date
F

FrozenRope

I have 2 columns of data. Column A has 50,000 records and Column B has 4,000
records (many of which are in Column A).

I need to generate Column C which should be all of the values from Column A
that are not also in Column B. The net yield for Column C should be roughly
46,000 records.

How can I go about solving this problem?

Many thanks!!!!
 
I have to do this sort of thing often, determine which of a column of values
do (or do not) appear in another column. There are multiple ways to do it,
but my favorite goes like this: Let's see, you want to a list of values in A
that do not appear in B. Ok, in column C opposite one of the values in A,
say row 2, put the formula "=MATCH(A2,B,0)". I'm used to working with R1C1
notation so I may have done this wrong, but the MATCH function should look in
column B for any exact match on the cell in column A, and return either the
row number where a match occurred or some error, #N/A I think. Copy this to
each cell in column C opposite the values in A and you can tell at a glance
which ones match and which ones don't.

But you don't want to "tell at a glance", you want a list of the values that
aren't there. So wrap an IF around the MATCH:

=IF(ISERROR(MATCH(A2,B,0)),A2,"")

(Or is it "ISERR"? I think you want the one that's spelled out, "ISERROR".)
If the match returns #N/A then it means the value in A is not in B, so it
lists the value in A; if it found a match in B it puts a null string in C.
Now you can sort column C so that all the null strings are at one end of the
column, and the rest of the values are the ones you want.
 
Another play which should deliver the goods for you
Assuming data in row2 down in cols A and B
Put in C2:
=IF(A2="","",IF(ISNUMBER(MATCH(A2,B:B,0)),"",ROW()))
Leave C1 blank

Put in D2:
=IF(ROWS($1:1)>COUNT(C:C),"",INDEX(A:A,SMALL(C:C,ROWS($1:1))))
Select C2:D2, copy down to the last row of data in col A, ie down to D50001.
Col D will return the required results, ie items in col A not found in col B,
all neatly bunched at the top.
 
Back
Top