Counting unique entries across two or three columns

T

toadflax

Hi,

I'm looking for a way to get a count of unique entries across multiple
columns of data. Say for example I had (with semi-colons indicating
the columns A B and C):
Smith;John;H
Smith;John;K
Smith;Catherine;
Jones;John;
Jones;Susan;L
Jones;Susan;B

I want to be able to get a count of:
(a) all the unique surnames/families (for the above example it's 2:
Smith and Jones). I think I have done this using a formula I found on
the web, which is =SUMPRODUCT((A1:A6<>"")/COUNTIF(A1:A6,A1:A6&""))
(b) all the unique combinations of surname and first name (in the
above example it's 4, Smith, John; Smith, Catherine; Jones, John and
Jones, Susan)
(c) all the unique combinations of surname, first name and middle
initial. However, I don't want to count those without an initial i.e.
where the cell is blank. So for the above example I want the answer 4:
Smith, John H; Smith, John, K; Jones, Susan L; Jones, Susan B).

I just need running counts of these different categories as I add data
and I don't want to use filters. Can this be done with functions? I'd
really appreciate any help.


Thanks very much,
Michelle
 
D

Dave Peterson

I'd use some helper cells.

=a1&";"&b1
and drag down

and another column of
=a1&";"&b1&";"&c1
and drag down.

Then use that formula to count uniques in those helper columns.

(use a character that doesn't show up in any of the name fields, though.)
 
R

Ron Rosenfeld

Hi,

I'm looking for a way to get a count of unique entries across multiple
columns of data. Say for example I had (with semi-colons indicating
the columns A B and C):
Smith;John;H
Smith;John;K
Smith;Catherine;
Jones;John;
Jones;Susan;L
Jones;Susan;B

I want to be able to get a count of:
(a) all the unique surnames/families (for the above example it's 2:
Smith and Jones). I think I have done this using a formula I found on
the web, which is =SUMPRODUCT((A1:A6<>"")/COUNTIF(A1:A6,A1:A6&""))
(b) all the unique combinations of surname and first name (in the
above example it's 4, Smith, John; Smith, Catherine; Jones, John and
Jones, Susan)
(c) all the unique combinations of surname, first name and middle
initial. However, I don't want to count those without an initial i.e.
where the cell is blank. So for the above example I want the answer 4:
Smith, John H; Smith, John, K; Jones, Susan L; Jones, Susan B).

I just need running counts of these different categories as I add data
and I don't want to use filters. Can this be done with functions? I'd
really appreciate any help.


Thanks very much,
Michelle

One way:

Download and install Longre's free morefunc.xll add-in from
http://xcell05.free.fr/morefunc/english/index.htm

Then use these functions: (Surname,FN and MI refer to named ranges which are
at least as large as your ranges containing that data.

IMPORTANT**: The second and third formulas are ARRAY-FORMULAS and must be
entered with <ctrl><shift><enter>. If you do that correctly, Excel will place
braces {...} around the formulas.

Surname =COUNTDIFF(Surname)
Surname+FN =COUNTDIFF(Surname&FN,,"")
SN+FN+MI =COUNTDIFF(IF(LEN(MI)>0,Surname&FN&MI),,FALSE)

--ron
 
R

Rose

Hi,
Extending from Michelle's case, how to generate the counting table looks
like the following?

Smith Jones H I J K ....
John 2 1
Catherine 1 0
Susan 0 2
A
B
C
D
....
 
T

toadflax

So there's really no way to do it without additional columns or a
download? I'm a bit reluctant to do that because I need to hand the
file over to someone else for data entry, so I'd really like something
that will keep recalculating without having to add additional software
or columns...


Cheers,
Michelle
 
J

JP

Try this:

=IF(COUNTA(B2:C14)=SUM(1/COUNTIF(B2:C14,B2:C14)),"All Unique","Some
dupes")

Adjust ranges as appropriate, I assumed B2:C14

This is an array formula so Ctrl-Shift-Enter to commit


HTH,
JP
 
R

Ron Rosenfeld

So there's really no way to do it without additional columns or a
download? I'm a bit reluctant to do that because I need to hand the
file over to someone else for data entry, so I'd really like something
that will keep recalculating without having to add additional software
or columns...


Cheers,
Michelle

Well, it is possible to embed morefunc.xll in the workbook you hand over. That
is a menu option.
--ron
 
H

Harlan Grove

(e-mail address removed) wrote...
I'm looking for a way to get a count of unique entries across
multiple columns of data. Say for example I had (with semi-colons
indicating the columns A B and C):
Smith;John;H
Smith;John;K
Smith;Catherine;
Jones;John;
Jones;Susan;L
Jones;Susan;B

Assume these entries were in A1:C6.
I want to be able to get a count of:
(a) all the unique surnames/families (. . .). I think I have done
this using a formula I found on the web, which is
SUMPRODUCT((A1:A6<>"")/COUNTIF(A1:A6,A1:A6&""))

That's the best formula for it.
(b) all the unique combinations of surname and first name (in the
above example it's 4, . . .)

Gets more complicated, but still possible with a single ARRAY formula
using only built-in functions.

=SUM(1/MMULT(--(A1:A6&";"&B1:B6=TRANSPOSE(A1:A6&";"&B1:B6)),
ROW(A1:A6)^0))
(c) all the unique combinations of surname, first name and middle
initial. However, I don't want to count those without an initial
i.e. where the cell is blank. So for the above example I want the
answer 4: . . .

Mor complicated still, but possible with another ARRAY formula.

=SUM(IF(C1:C6>"",1/MMULT(--(A1:A6&";"&B1:B6&";"&C1:C6
=TRANSPOSE(A1:A6&";"&B1:B6&";"&C1:C6)),ROW(A1:A6)^0)))
 
T

toadflax

Yes! Thank you! That has done the trick very nicely.

Thanks so much for all the suggestions, I really appreciate them. Now
I just have to formulate the questions for all the other stuff I want
to do with these data! :->


All the best,
Michelle
 
I

IIvoicemail

Hi, I found this post very interesting and am very impressed with the
formulas provided, however I would like to ask for something extra
which I don't know if is actually possible (without VBA).

From the above post..
Gets more complicated, but still possible with a single ARRAY formula
using only built-in functions.
=SUM(1/MMULT(--(A1:A6&";"&B1:B6=TRANSPOSE(A1:A6&";"&B1:B6)), ROW(A1:A6)^0))

I would like a modification to so that only the unique combinations of
surname and first name where first name = John are returned (thus 2 in
the example).

Any help would be appraciated as I'm stuck :)

Thanks.
 
R

RagDyer

Say you enter the first name to be used into D1, then try this *array*
formula:

=COUNT(1/FREQUENCY(IF((B1:B6=D1),MATCH(A1:A6,A1:A6,0)),ROW(1:6)))

--
Array formulas must be entered with CSE, <Ctrl> <Shift > <Enter>, instead of
the regular <Enter>, which will *automatically* enclose the formula in curly
brackets, which *cannot* be done manually. Also, you must use CSE when
revising the formula.
 
I

IIvoicemail

Say you enter the first name to be used into D1, then try this *array*
formula:

=COUNT(1/FREQUENCY(IF((B1:B6=D1),MATCH(A1:A6,A1:A6,0)),ROW(1:6)))

--
Array formulas must be entered with CSE, <Ctrl> <Shift > <Enter>, instead of
the regular <Enter>, which will *automatically* enclose the formula in curly
brackets, which *cannot* be done manually.  Also, you must use CSE when
revising the formula.

--
HTH,

RD

---------------------------------------------------------------------------
Please keep all correspondence within the NewsGroup, so all may benefit !









- Show quoted text -

Thanks very much it worked really well!! However I would like to make
another modification which allows the function to evaluate every row
instead of a predefined list (e.g. 1 to 6).

I think I nearly have it but the last statement is causing problems
which contains ROW..

I put the following in a cell (e.g. F3) -> =MATCH("zzzzzzzzzz",A:A)

=COUNT(1/
FREQUENCY(IF((B1:INDEX(B:B,F3)="John"),MATCH(A1:INDEX(A:A,F3),A1:INDEX(A:A,F3),
0)),ROW(2:11)))

Thanks again.
 
R

RagDyeR

Why not just oversize the formula?

=COUNT(1/FREQUENCY(IF((B1:B600="John"),MATCH(A1:A600,A1:A600,0)),ROW(1:600)))


Blanks shouldn't affect the accuracy.
--

Regards,

RD
-----------------------------------------------------------------------------------------------
Please keep all correspondence within the Group, so all may benefit !
-----------------------------------------------------------------------------------------------

Say you enter the first name to be used into D1, then try this *array*
formula:

=COUNT(1/FREQUENCY(IF((B1:B6=D1),MATCH(A1:A6,A1:A6,0)),ROW(1:6)))

--
Array formulas must be entered with CSE, <Ctrl> <Shift > <Enter>, instead
of
the regular <Enter>, which will *automatically* enclose the formula in
curly
brackets, which *cannot* be done manually. Also, you must use CSE when
revising the formula.

--
HTH,

RD

---------------------------------------------------------------------------
Please keep all correspondence within the NewsGroup, so all may benefit !
---------------------------------------------------------------------------<[email protected]>
wrote in message









- Show quoted text -

Thanks very much it worked really well!! However I would like to make
another modification which allows the function to evaluate every row
instead of a predefined list (e.g. 1 to 6).

I think I nearly have it but the last statement is causing problems
which contains ROW..

I put the following in a cell (e.g. F3) -> =MATCH("zzzzzzzzzz",A:A)

=COUNT(1/
FREQUENCY(IF((B1:INDEX(B:B,F3)="John"),MATCH(A1:INDEX(A:A,F3),A1:INDEX(A:A,F3),
0)),ROW(2:11)))

Thanks again.
 
T

T. Valko

If the ranges in question do start on row 1 then there is no problem (until
you insert new rows above the data!).

ROW needs to return an array from 1 to n that will match the output of
MATCH. One way to do that:

ROW(INDIRECT("1:"&F3))

Where F3 is the OP's MATCH formula that defines the end of range:
=MATCH("zzzzzzzzzz",A:A)

On a side note....

This is one of them there top reply to bottom post to top reply to etc. etc.
cluster_____s

<bg>
 
R

RagDyeR

So ... when are all the "misguided" going to wise up and just TOP POST?<bg>
--

Regards,

RD
-----------------------------------------------------------------------------------------------
Please keep all correspondence within the Group, so all may benefit !
-----------------------------------------------------------------------------------------------

If the ranges in question do start on row 1 then there is no problem (until
you insert new rows above the data!).

ROW needs to return an array from 1 to n that will match the output of
MATCH. One way to do that:

ROW(INDIRECT("1:"&F3))

Where F3 is the OP's MATCH formula that defines the end of range:
=MATCH("zzzzzzzzzz",A:A)

On a side note....

This is one of them there top reply to bottom post to top reply to etc. etc.
cluster_____s

<bg>
 
T

T. Valko

when are all the "misguided" going to wise up and just TOP POST?

I don't think they have a choice at Google Groups (where the OP originated).
 
D

David Biddulph

Yes, they have. Google Groups lets them post either at the top ...
I don't think they have a choice at Google Groups (where the OP originated).

.... or in the middle ...

.... or Google Groups lets them post at the bottom, or any
combination.
 
H

Harlan Grove

T. Valko said:
I don't think they have a choice at Google Groups (where the OP
originated).
....

There's always a choice. It's more a matter of the default behavior of
the user's newsreader and their laziness.

Clearly OE users are an extremely lazy bunch.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top