Excel - Interquartile Range Miscalulation

E

EngelhardtMD

I calculate the Interquartile Range for the following values both by hand and
using the Minitab Statistical Package I get an IQR = 20 that is the 3rd
quartile of 105 - the first quartile of 85 but when I use Excel 2007 I get
IQR = 15 = (102.5-87.5).

Can anyone explain why this is?

Sincerely,

Mark D. Engelhardt, Ph.D.
(e-mail address removed)
636.293.0901

55
75
80
85
90
95
100
100
100
100
100
105
115
120
135
 
J

Jerry W. Lewis

There is no universally accepted definition of a percentile from sample
data. Hyndman and Fan, 1996, "Sample Quantiles in Statistical Packages", The
American Statistician 50(4):361-365,1996 discuss 9 different definitions and
reference some others.

Excel (also the quantile() function in S-PLUS and R) uses Hyndman and Fan's
7th definition, which considers the min and max to be the 0th and 100th
percentiles, and equally spaces the quantiles (interpolating as needed)
corresponding to intervening observations. This gives a reasonable
description of the sample, but is almost certainly biased as an estimate of
the underlying population quantiles.

At the time of their article (I am not aware of any subsequent changes),
Minitab's DESCRIBE used the 6th definition, and %DESCRIBE used the 2nd
definition.

SAS gives several options ...

Jerry
 
E

EngelhardtMD

Jerry,

It would seem to me that Excel's use of Hyndman and Fan's 7th definition
using the minimum value and maximum value as the 0th percentile and 100th
percentile respectively would be extremely sensitive to extreme outliers.

It was my understanding that one of the advantages of using the IQR was that
it was less sensitve to extreme outliers than was the variance.

Mark
 
J

Jerry W. Lewis

My last reply does not appear to have communicated with you, sorry.

The interquartile range is calculated using the 1st & 3rd sample quartiles,
but there are various ways to calculate those quartiles.

Five of Hyndman and Fan's sample quantile definitions have a particularly
simple common form given by the following VBA code. You select among the
methods according to which definition of m you uncomment.

Function quantile(data, p)
' p=fraction of population, e.g. p=0.25 for 1st quartile
n = WorksheetFunction.Count(data)
' m = 0 ' H&F 4: SAS (PCTLDEF=1), R (type=4), Maple (method=3)
' m = 0.5 ' H&F 5: R (type=5), Maple (method=4)
' m = p ' H&F 6: Minitab, SPSS, BMDP, JMP, SAS (PCTLDEF=4), R
(type=6), Maple (method=5)
' m = 1 - p ' H&F 7: Excel, S-Plus, R (type=7[default]), Maxima, Maple
(method=6)
' m = (p+1)/3 ' H&F 8: R (type=8), Maple (method=7[default])
' m=(p+1.5)/4 ' H&F 9: R (type=9), Maple (method=8)
npm = n * p + m
j = Fix(npm): If j = 0 Then j = 1
If j > n Then j = n
g = npm - j
quantile = WorksheetFunction.Small(data, j)
If g >= 0 And j < n Then
quantile = (1 - g) * quantile + g * WorksheetFunction.Small(data, j
+ 1)
End If
End Function


Excel, S-Plus, etc use H&F definition 7, which returns SMALL(data,i) as
quantile(data,(i-1)/(n-1)) and interpolates in between. For a continuous
distribution, this will tend to give too narrow an interquartile range, since
there will tend to be a small fraction of the population beyond the extreme
sample observations. In particular, for odd n (=2*k+1), Excel calculates the
1st (3rd) quartile as the median of the lower (upper) "half" of the sample
including the sample median (k+1 observations).

Minitab, etc use H&F definition 6, which calculates the 1st (3rd) quartile
as the median of the lower (upper) "half" of the sample. This "half" sample
excludes the sample median (k observations) for odd n (=2*k+1). This will
tend to be a better estimate for the population quartiles, but will tend to
give quartile estimates that are a bit too far from the center of the whole
sample (too wide an interquartile range).

Hyndman and Fan recommend their definition 8 (Maple's default definition),
which gives quartiles between those reported by Minitab and Excel. This
approach is approximately median unbiased for continuous distributions.

Jerry
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top