Count frequencies; Pareto of nonumeric values

N

Niko Roorda

Hello,

I have a spreadsheet which contains (a.o.) a column with country names,
like:

USA
Engeland
Nederland
Nieuw Zeeland
Nederland
USA
USA
Engeland
Engeland
Engeland
Engeland
Nieuw Zeeland
Frankrijk
USA
USA
Engeland
Engeland


Now, I want to automatically count the frequencies of all the countries in
this column. Of course I can do this by first making a list of all the
various countries occurring in the column, and then use the Countif
function. But then I have to make this list myself, and every time when a
new country is entered into the column, the list of occurring countries has
to be updated by hand.
Is there a way in which Excel can make this list automatically?

What I really want with it is create a Pareto graph. Unfortunately, when I
use the Statistical Analysis tool, trying to create a histogram, I receive
an error, saying that the data are non-numeric. Well, of course the country
names are nonnumeric, but why would that make it impossible to create a
histogram?

In other words, I have 2 questions:
1. How can I make an automatic list of all the unique entries in a range?
2. How can I make a histogram, a pareto analysis or a frequency counting of
non-numerical data?


Yours, Niko
 
A

Andy B

Hi

Try using a pivot table (from Data / Pivot Table). This is one of the best
ways of summarising data.
 
H

Harlan Grove

Niko Roorda said:
I have a spreadsheet which contains (a.o.) a column with country
names, like:

USA
Engeland
Nederland
Nieuw Zeeland
Nederland
USA
USA
Engeland
Engeland
Engeland
Engeland
Nieuw Zeeland
Frankrijk
USA
USA
Engeland
Engeland


Now, I want to automatically count the frequencies of all the
countries in this column. . . . Is there a way in which Excel
can make this list automatically?

Name the list above Lst. Then enter these formulas.

D1:
=INDEX(Lst,1)

E1:
=COUNTIF(Lst,D1)

D2 [array formula]:
=INDEX(Lst,MATCH(0,COUNTIF(D$1:D1,Lst),0))

Select D2 and fill down as needed. The formulas will return #N/A when the
distinct entries in the list have been exhausted. Then select E1 and fill
down to match the formulas in column D.
What I really want with it is create a Pareto graph. Unfortunately,
when I use the Statistical Analysis tool, trying to create a
histogram, I receive an error, saying that the data are non-numeric.
Well, of course the country names are nonnumeric, but why would that
make it impossible to create a histogram?

Because histograms are necessarily indexed by number and have numeric
values. What you want is a bar chart. Don't bother using the statistical
tools since they won't work with your data anyway. Use the Insert > Chart
menu command.
 
L

Leo Heuser

Hello Niko

A different way of doing it:

Countries in B2 and down.

In e.g. C2 enter this array formula:

=MMULT(TRANSPOSE((B2=$B$2:$B$200)+0),ROW($B$2:$B$200)^0)

enter with <Shift><Ctrl><Enter>, also if edited later.

Copy C2 down.
 
N

Niko Roorda

Thank you. But it doesn't work as it should; the 2nd INDEX function renders
"not available". So something must be wrong.
For me, the way you use the COUNTIF function is new, and I don't understand
it. The second argument you use is Lst, and this is the position where the
criteria should be. How can a range of cells be the criteria?

Niko



Harlan Grove said:
Niko Roorda said:
I have a spreadsheet which contains (a.o.) a column with country
names, like:

USA
Engeland
Nederland
Nieuw Zeeland
Nederland
USA
USA
Engeland
Engeland
Engeland
Engeland
Nieuw Zeeland
Frankrijk
USA
USA
Engeland
Engeland


Now, I want to automatically count the frequencies of all the
countries in this column. . . . Is there a way in which Excel
can make this list automatically?

Name the list above Lst. Then enter these formulas.

D1:
=INDEX(Lst,1)

E1:
=COUNTIF(Lst,D1)

D2 [array formula]:
=INDEX(Lst,MATCH(0,COUNTIF(D$1:D1,Lst),0))

Select D2 and fill down as needed. The formulas will return #N/A when the
distinct entries in the list have been exhausted. Then select E1 and fill
down to match the formulas in column D.
What I really want with it is create a Pareto graph. Unfortunately,
when I use the Statistical Analysis tool, trying to create a
histogram, I receive an error, saying that the data are non-numeric.
Well, of course the country names are nonnumeric, but why would that
make it impossible to create a histogram?

Because histograms are necessarily indexed by number and have numeric
values. What you want is a bar chart. Don't bother using the statistical
tools since they won't work with your data anyway. Use the Insert > Chart
menu command.
 
N

Niko Roorda

Thank you! I had never heared of this tool. It took me some time to figure
it all out, but it works great. It's exactly what I need.

Niko
 
N

Niko Roorda

Well, I tried it. But the result is just equal to a situation in which I
insert in C2 (and copy down):

=COUNTIF($B$2:$B$125;$B2)

In other words, your formula just counts, and it renders numbers.

Niko
 
H

Harlan Grove

Niko Roorda said:
Thank you. But it doesn't work as it should; the 2nd INDEX function
renders "not available". So something must be wrong.
For me, the way you use the COUNTIF function is new, and I don't
understand it. The second argument you use is Lst, and this is the
position where the criteria should be. How can a range of cells be
the criteria? ....
"Harlan Grove" <[email protected]> schreef in bericht ....
D1:
=INDEX(Lst,1)

E1:
=COUNTIF(Lst,D1)

D2 [array formula]:
=INDEX(Lst,MATCH(0,COUNTIF(D$1:D1,Lst),0))
....

The second INDEX formula is an *array* formula. Type it in a cell, then hold
down [Ctrl] and [Shift] keys before pressing [Enter].

As for argument order in its COUNTIF call, by making the range, Lst, the
criteria, there are multiple *separate* criteria, and the result from
COUNTIF will be an array, which is then searched by MATCH.

For example, if A1 contained 5, B1:B6 contained {1;3;5;4;5;0}, then
COUNTIF(A1,B1:B6) would return {0;0;1;0;1;0}.
 
L

Leo Heuser

Niko Roorda said:
Well, I tried it. But the result is just equal to a situation in which I
insert in C2 (and copy down):

=COUNTIF($B$2:$B$125;$B2)

In other words, your formula just counts, and it renders numbers.

Niko

You are quite right, what a masterpiece <bg>
Wonder where my thoughts were, when I delivered that one.
Sorry about that!

Regards
Leo Heuser
 
N

Niko Roorda

Wow, I see! That is great.

The result is the frequency table that I look for. An advantage in
comparison with the Pivot method that Andy B introduced me to is, that the
data are dynamically linked, and are immediately updated; a pivot table has
to be updated by hand, every time the data change.

A disadvantage of your suggestion is, as it seems to me, that it is
impossible to order the resulting table so that the highest frequency is
shown first, etc. Do you know of a way to solve this problem?

Niko



Harlan Grove said:
Niko Roorda said:
Thank you. But it doesn't work as it should; the 2nd INDEX function
renders "not available". So something must be wrong.
For me, the way you use the COUNTIF function is new, and I don't
understand it. The second argument you use is Lst, and this is the
position where the criteria should be. How can a range of cells be
the criteria? ...
"Harlan Grove" <[email protected]> schreef in bericht ...
D1:
=INDEX(Lst,1)

E1:
=COUNTIF(Lst,D1)

D2 [array formula]:
=INDEX(Lst,MATCH(0,COUNTIF(D$1:D1,Lst),0))
...

The second INDEX formula is an *array* formula. Type it in a cell, then hold
down [Ctrl] and [Shift] keys before pressing [Enter].

As for argument order in its COUNTIF call, by making the range, Lst, the
criteria, there are multiple *separate* criteria, and the result from
COUNTIF will be an array, which is then searched by MATCH.

For example, if A1 contained 5, B1:B6 contained {1;3;5;4;5;0}, then
COUNTIF(A1,B1:B6) would return {0;0;1;0;1;0}.
 
H

Harlan Grove

...
...
A disadvantage of your suggestion is, as it seems to me, that it is
impossible to order the resulting table so that the highest frequency is
shown first, etc. Do you know of a way to solve this problem?
...

Column E formulas remain as-is. Change column D formulas as follows.

D1:
=INDEX(Lst,MODE(MATCH(Lst,Lst,0)))

D2 [array formula]:
=INDEX(Lst,MODE(IF(COUNTIF(D$1:D1,Lst)=0,MATCH(Lst,Lst,0)+{0,0})))

Select D2 and fill down as needed. The formulas return #N/A when the distinct
entries in Lst have been exhausted.

The '+{0,0}' term is necessary in case the least frequently appearing entries in
Lst appear only once. MODE returns #N/A unless at least one number in its
arguments appears at least twice. Adding {0,0} to the IF(..) result ensures
there are at least two instances of 1 when the least frequent entry in Lst
appears only once.

As for ties in Lst, the column D formulas return tying entries in the order in
which they appear in Lst. If you want them in alphabetical order, the formulas
get more complicated. Now all column D formulas are array formulas.

D1:
=INDEX(Lst,MATCH(MIN(IF(COUNTIF(Lst,Lst)=MAX(COUNTIF(Lst,Lst)),
COUNTIF(Lst,"<="&Lst))),COUNTIF(Lst,"<="&Lst),0))

D2:
=INDEX(Lst,MATCH(MIN(IF(COUNTIF(Lst,Lst)
*(COUNTIF(D$1:D1,Lst)=0)=LARGE(COUNTIF(Lst,Lst),SUM(E$1:E1,1)),
COUNTIF(Lst,"<="&Lst))),COUNTIF(Lst,"<="&Lst),0))

These column D formulas will return #NUM! when the distinct entries in Lst have
been exhausted.
 
N

Niko Roorda

Wow! You are great. Your formulas are very complicated, and they work, I
tested them!
Thanks!

In fact, the array formula in the D column returns 0 instead of #Num. Now
it's easy to avoid those zeros totally by hiding the D column and inserting
a new column with a formula like =IF(D1>0,D1,"").
Of course, this can be combined with your formula into one formula, which is
more than twice as long as your formula (becuase it repeats it twice), but
that is a bit ugly. Perhaps there is a more elegant solution for this, but
that is not important.

Niko


Harlan Grove said:
...
..
A disadvantage of your suggestion is, as it seems to me, that it is
impossible to order the resulting table so that the highest frequency is
shown first, etc. Do you know of a way to solve this problem?
..

Column E formulas remain as-is. Change column D formulas as follows.

D1:
=INDEX(Lst,MODE(MATCH(Lst,Lst,0)))

D2 [array formula]:
=INDEX(Lst,MODE(IF(COUNTIF(D$1:D1,Lst)=0,MATCH(Lst,Lst,0)+{0,0})))

Select D2 and fill down as needed. The formulas return #N/A when the distinct
entries in Lst have been exhausted.

The '+{0,0}' term is necessary in case the least frequently appearing entries in
Lst appear only once. MODE returns #N/A unless at least one number in its
arguments appears at least twice. Adding {0,0} to the IF(..) result ensures
there are at least two instances of 1 when the least frequent entry in Lst
appears only once.

As for ties in Lst, the column D formulas return tying entries in the order in
which they appear in Lst. If you want them in alphabetical order, the formulas
get more complicated. Now all column D formulas are array formulas.

D1:
=INDEX(Lst,MATCH(MIN(IF(COUNTIF(Lst,Lst)=MAX(COUNTIF(Lst,Lst)),
COUNTIF(Lst,"<="&Lst))),COUNTIF(Lst,"<="&Lst),0))

D2:
=INDEX(Lst,MATCH(MIN(IF(COUNTIF(Lst,Lst)
*(COUNTIF(D$1:D1,Lst)=0)=LARGE(COUNTIF(Lst,Lst),SUM(E$1:E1,1)),
COUNTIF(Lst,"<="&Lst))),COUNTIF(Lst,"<="&Lst),0))

These column D formulas will return #NUM! when the distinct entries in Lst have
been exhausted.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top