Returning all data from a list above the nth percentile

C

chlor

Hello,

Is there any function in Excel that returns all values from a list
above or below a certain percentage rank/percentile?

For instance, if you want the average of the 2% highest numbers in a
list? Or the sum of the 25% most expensive items in a shopping
catalogue?

It has to be a "dynamic" function, where you can just drop in a set of
figures, sort the list, add a percentile, and perform a calculation on
all the figures from the cutoff point signalled by the percentile...

Hope anyone can help me out here.

Thanks
Christian
 
V

vezerid

Chlor

you can base your aggregate functions on virtual arrays, which have the
same length as your data and include either the original numbers or
FALSE in their corresponding positions, for numbers meeting certain
criteria. Logical values are ignored in such functions. Example:

=AVERAGE(IF(A1:A10>PERCENTILE(A1:A10),A1:A10))

These are *array* formulas, thus you enter them with Shift+Ctrl+Enter.

Does this help?

Kostis Vezerides
 
C

chlor

I do have a follow up question:

If you want to apply the above function to categories, for instance:

95-100%
60-95%
30-60%
10-30%
0-10%

How do you write these functions?


regards
Christian
 
V

vezerid

Christian,

I just saw your message.

So you mean you want the average of those in the top 5%, then the next
35% etc? I think it is the inverse function of PERCENTILE, i.e.
PERCENTRANK that you must use. Given a data set in A1:A10 and a value
in the same scale as the dataset in B1, then

=PERCENTRANK(A1:A10,B1)

will give you the percentage of data with a value below. Thus, to get
the average of those in the 60-95% you would say something like:

=AVERAGE(IF(AND(PERCENTRANK(A1:A10,A1:10)>0.6,PERCENTRANK(A1:A10,A1:A10)<0.95),A1:A10))

Does this help?

Kostis
 
C

chlor

Hello again,

I think we are getting closer, but the function returns #N/A.

This is the function I have used:

{=AVERAGE(IF(AND(PERCENTRANK(G2:G150; G2:G150)>0,6;
PERCENTRANK(G2:G150; G2:G150)<0,95); G2:G150))}

Any further suggestions?
 
V

vezerid

Christian,

As you see I do not immediately follow up because of my time zone... I
must admist I had not tested the formula I gave you. Now I tested it
with a dataset in A1:A10. When the dataset was empty it returned #N/A
(ok, this would make sense). WHen I entered data it returned 0. This
did not make sense. I broke the formula as follows, using intermediate
results:

In column B:B:
=PERCENTRANK($A$1:$A$10,A1)
In column C:C
=IF(AND(PERCENTRANK($A$1:$A$10,A1)>0.6,PERCENTRANK($A$1:$A$10,A1)<0.95),A1)
In a separate cell:
=AVERAGE(C1:C10)

This worked. I am still baffled about this and I will probably post a
question myself. Nothing in the documentation says anything about such
behavior. At any rate, if you can afford the luxury of auxiliary
columns you can adopt the approach above.

HTH
Kostis
 
V

vezerid

OK, just clarified it though my own post. Try this:

=AVERAGE(IF((PERCENTRANK(A1:A10,A1:A10)>0.6)*(PERCENTRANK(A1:A10,A1:A10)<0.95),A1:A10))

Again array entered.

Regards

Kostis
 
C

chlor

Great Kostis, it works just fine, thanks a lot.

To make this function completely dynamic, I need the cell references to
be open-ended, as the function returns #N/A if the cell references do
not match exactly the number of cells with numbers in them.

This means that I can not define the function to a general cell
reference (fex, having sets with various amounts of numbers that needs
to be calculated, I could generalise and set every cell reference in
all the functions to fex A1:A5000, knowing that my sets of numbers
never exceed 4000) in order to be able to just drop in columns of
numbers afterwards, and I have to adjust every function to match
exactly the number of columns/rows that I have filled with numbers each
time.

I have searched for solutions with regards to open-ended cell
references, but can not find anything.

Do you have any clue as to how to solve this? I know I am asking a lot
here, this have really been of great help to me.
 
V

vezerid

Christian,

You lucky dog <g>. Yes, it can be done the way you ask. The following
formula will do this, based on the indirect function. As you see,
essentially we are replacing the expression A1:A10 with the expression:

INDIRECT("A1:A"&COUNT(A:A))

This assumes that your data start from 1 and subsequently you have all
numbers. If for example your data start from A2, it should be:

INDIRECT("A2:A"&COUNT(A:A)+1)

An alternative is to write the range, say, A2:A3562 in a cell E1 (enter
it as simple text) and use instead

INDIRECT(E1)

The full formula for cells starting at A1 is:

=AVERAGE(IF((PERCENTRANK(INDIRECT("A1:A"&COUNT(A:A)),INDIRECT("A1:A"&COUNT(A:A)))>0.6)*(PERCENTRANK(INDIRECT("A1:A"&COUNT(A:A)),INDIRECT("A1:A"&COUNT(A:A)))<0.95),INDIRECT("A1:A"&COUNT(A:A))))

Again to be array entered.

Regards,
Kostis
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top