Algorithm for computing Excel PercentRank

  • Thread starter Thread starter Kam Mistry
  • Start date Start date
K

Kam Mistry

I am current trying to determine how Excel calculates PercentRank.

The project I am working on requires the use of this function - and we
cannot use COM automation to call Excel.

The only documentation I could find was the following:

Excel Function: PERCENTRANK(x, array)

- If x matches one of the values in the array, this function is
equivalent to the Excel formula = (RANK(x)-1)/(N-1) where N is the
number of data points.

- If x does not match one of the values, then the PERCENTRANK function
interpolates.

Rank(x) is easy to compute. The second statement is where I'm having
trouble with, how does Excel "interpolates" the percentage rank? What
method does it use?

Thanks in advance,
Kam Mistry
Microsoft .NET Developer (C#)
 
Kam Mistry wrote...
I am current trying to determine how Excel calculates PercentRank.

The project I am working on requires the use of this function - and we
cannot use COM automation to call Excel.
....

But you need to duplicate Excel's PERCENTRANK sematics exactly? If so,
PERCENTRANK(Rng,x) needs to return #N/A or some other error indicator
when x > MAX(Rng).

Also, PERCENTRANK rounds differently than the formula you mention. For
example, with 1..8 in G1:G8 and 5 in G10,

PERCENTRANK(G1:G8,G10,6) returns 0.571428
ROUND((RANK(G10,G1:G8,1)-1)/7,6) returns 0.571429

Rank(x) is easy to compute. The second statement is where I'm having
trouble with, how does Excel "interpolates" the percentage rank? What
method does it use?

What ever happened to exploratory data analysis? It's simple linear
interpolation between the points in the range bracketting your second
argument value. That is,

=TREND(PERCENTRANK(Rng,
{1;0}*MAX(IF(Rng<=x,Rng))+{0;1}*MIN(IF(Rng>=x,Rng)),6),
{1;0}*MAX(IF(Rng<=x,Rng))+{0;1}*MIN(IF(Rng>=x,Rng)),x)

or, since you're going to be programming this,


if (YourErrorCheckFcnHere(pr = YourPercentRankFcnHere(rng, x)) {
for (i = 0, xlo = xhi = x; i < sizeof(rng) / sizeof(*rng); ++i) {
if (xlo < rng && rng < x) xlo = rng;
if (xhi > rng && rng > x) xhi = rng;
}

ylo = YourPercentRankFcnHere(rng, xlo);
yhi = YourPercentRankFcnHere(rng, xhi);

pr = ((xhi - x) * ylo + (x - xlo) * yhi) / (xhi - xlo);
}


If rng is sorted, use binary search rather than simple iteration to
locate the points in rng bracketting x. I assumed rng was a simple
array. Adapt for more sophisticated data structures.
 
Thanks for you help! Our test cases against Excel PercentRank work.

Cheers,
Kam


Harlan Grove said:
Kam Mistry wrote...
I am current trying to determine how Excel calculates PercentRank.

The project I am working on requires the use of this function - and we
cannot use COM automation to call Excel.
...

But you need to duplicate Excel's PERCENTRANK sematics exactly? If so,
PERCENTRANK(Rng,x) needs to return #N/A or some other error indicator
when x > MAX(Rng).

Also, PERCENTRANK rounds differently than the formula you mention. For
example, with 1..8 in G1:G8 and 5 in G10,

PERCENTRANK(G1:G8,G10,6) returns 0.571428
ROUND((RANK(G10,G1:G8,1)-1)/7,6) returns 0.571429

Rank(x) is easy to compute. The second statement is where I'm having
trouble with, how does Excel "interpolates" the percentage rank? What
method does it use?

What ever happened to exploratory data analysis? It's simple linear
interpolation between the points in the range bracketting your second
argument value. That is,

=TREND(PERCENTRANK(Rng,
{1;0}*MAX(IF(Rng<=x,Rng))+{0;1}*MIN(IF(Rng>=x,Rng)),6),
{1;0}*MAX(IF(Rng<=x,Rng))+{0;1}*MIN(IF(Rng>=x,Rng)),x)

or, since you're going to be programming this,


if (YourErrorCheckFcnHere(pr = YourPercentRankFcnHere(rng, x)) {
for (i = 0, xlo = xhi = x; i < sizeof(rng) / sizeof(*rng); ++i) {
if (xlo < rng && rng < x) xlo = rng;
if (xhi > rng && rng > x) xhi = rng;
}

ylo = YourPercentRankFcnHere(rng, xlo);
yhi = YourPercentRankFcnHere(rng, xhi);

pr = ((xhi - x) * ylo + (x - xlo) * yhi) / (xhi - xlo);
}


If rng is sorted, use binary search rather than simple iteration to
locate the points in rng bracketting x. I assumed rng was a simple
array. Adapt for more sophisticated data structures.
 
Here's the C# function I wrote where "listOfRankItems" is an array of
struct:


private float PercentRank(rankItem[] listOfRankItems, float valueX)
{

/*--------------------------------------------------------------------------
---------

Calculate the PERCENTRANK(array, x)

If X matches one of the values in the array, this function is
equivalent to
the Excel formula =(RANK(x)-1)/(N-1) where N is the number of data
points.

If X does not match one of the values, then the PERCENTRANK function
interpolates.

----------------------------------------------------------------------
-------------*/

int numberOfItems = listOfRankItems.Length;
float resultPR = valueX;
bool foundX = false;

for (int index = 0; index < numberOfItems; index++)
{
if (listOfRankItems[index].WealthAmount == valueX)
{
resultPR = ((float)index)/((float)(numberOfItems - 1));
foundX = true;
break;
}
}

// calculate value using linear interpolation

if (!foundX)
{
float x1, x2, y1, y2;

x1 = x2 = valueX;

foundX = false;

for (int index = 0; index < numberOfItems - 1; index++)
{
if ((listOfRankItems[index].WealthAmount < valueX && valueX <
listOfRankItems[index + 1].WealthAmount) ||
(listOfRankItems[index].WealthAmount > valueX && valueX >
listOfRankItems[index + 1].WealthAmount))
{
x1 = listOfRankItems[index].WealthAmount;
x2 = listOfRankItems[index + 1].WealthAmount;
foundX = true;
break;
}
}

if (foundX)
{
y1 = PercentRank(listOfRankItems, x1);
y2 = PercentRank(listOfRankItems, x2);

resultPR = (((x2 - valueX)*y1 + (valueX - x1)*y2)) / (x2 - x1);
}
else
{
// use the smallest or largest value in the set which ever is
closer to valueX

if (Math.Abs(listOfRankItems[0].WealthAmount - valueX) <
Math.Abs(valueX - listOfRankItems[numberOfItems-1].WealthAmount))
x1 = listOfRankItems[0].WealthAmount;
else
x1 = listOfRankItems[numberOfItems-1].WealthAmount;

resultPR = PercentRank(listOfRankItems, x1);
}
}

return resultPR;
}



Harlan Grove said:
Kam Mistry wrote...
I am current trying to determine how Excel calculates PercentRank.

The project I am working on requires the use of this function - and we
cannot use COM automation to call Excel.
...

But you need to duplicate Excel's PERCENTRANK sematics exactly? If so,
PERCENTRANK(Rng,x) needs to return #N/A or some other error indicator
when x > MAX(Rng).

Also, PERCENTRANK rounds differently than the formula you mention. For
example, with 1..8 in G1:G8 and 5 in G10,

PERCENTRANK(G1:G8,G10,6) returns 0.571428
ROUND((RANK(G10,G1:G8,1)-1)/7,6) returns 0.571429

Rank(x) is easy to compute. The second statement is where I'm having
trouble with, how does Excel "interpolates" the percentage rank? What
method does it use?

What ever happened to exploratory data analysis? It's simple linear
interpolation between the points in the range bracketting your second
argument value. That is,

=TREND(PERCENTRANK(Rng,
{1;0}*MAX(IF(Rng<=x,Rng))+{0;1}*MIN(IF(Rng>=x,Rng)),6),
{1;0}*MAX(IF(Rng<=x,Rng))+{0;1}*MIN(IF(Rng>=x,Rng)),x)

or, since you're going to be programming this,


if (YourErrorCheckFcnHere(pr = YourPercentRankFcnHere(rng, x)) {
for (i = 0, xlo = xhi = x; i < sizeof(rng) / sizeof(*rng); ++i) {
if (xlo < rng && rng < x) xlo = rng;
if (xhi > rng && rng > x) xhi = rng;
}

ylo = YourPercentRankFcnHere(rng, xlo);
yhi = YourPercentRankFcnHere(rng, xhi);

pr = ((xhi - x) * ylo + (x - xlo) * yhi) / (xhi - xlo);
}


If rng is sorted, use binary search rather than simple iteration to
locate the points in rng bracketting x. I assumed rng was a simple
array. Adapt for more sophisticated data structures.
 
case anyone is interested (like me in the future)
here's a php equiv:

*********************************************


Code:
function PercentRank($listOfRankItems, $valueX) {

/*--------------------------------------------------------------------------
---------

Calculate the PERCENTRANK(array, x)

If X matches one of the values in the array, this function is
equivalent to
the Excel formula =(RANK(x)-1)/(N-1) where N is the number of data
points.

If X does not match one of the values, then the PERCENTRANK function
interpolates.

----------------------------------------------------------------------
-------------*/

	$numberOfItems = count($listOfRankItems);
	$resultPR = $valueX;

	$foundX = false;

	for ($index = 0; $index < $numberOfItems; $index++) {
		if ($listOfRankItems[$index] == $valueX)
		{
			$resultPR = ((float)$index)/((float)($numberOfItems - 1));
			$foundX = true;
			break;
		}
	}

	// calculate value using linear interpolation
	
	if (!$foundX) {
	
		$x1 = $x2 = $valueX;
		$foundX = false;
	
		for ($index = 0; $index < $numberOfItems - 1; $index++){
			if (($listOfRankItems[$index] < $valueX && $valueX < $listOfRankItems[$index + 1]) ||
					($listOfRankItems[$index] > $valueX && $valueX > $listOfRankItems[$index + 1])) {
				$x1 = $listOfRankItems[index];
				$x2 = $listOfRankItems[index + 1];
				$foundX = true;
				break;
			}
		}
		
		if ($foundX)
		{
			$y1 = PercentRank($listOfRankItems, $x1);
			$y2 = PercentRank($listOfRankItems, $x2);
			
			$resultPR = ((($x2 - $valueX)*$y1 + ($valueX - $x1)*$y2)) / ($x2 - $x1);
		}
		else {
			// use the smallest or largest value in the set which ever is closer to valueX
			
			if (abs($listOfRankItems[0] - $valueX) < abs($valueX - $listOfRankItems[$numberOfItems-1])) {
				$x1 = $listOfRankItems[0];
			}
			else {
				$x1 = $listOfRankItems[$numberOfItems-1];
			}
			$resultPR = PercentRank($listOfRankItems, $x1);
		}
	}

	return $resultPR;
}
 
easier version for non-mathmos.

A1:A18 contain the data. B1:B18 contain the outputs.

In this scenario, lowest number is 1 and highest number is 0.


B1: =(RANK(A1,A$1:A$18)-1)/17*10
B2: =(RANK(A2,A$1:A$18)-1)/17*10

etc

Comes to within 0.001 of Excel's answer
 
Back
Top