Standard deviation calculation error...


G

Guest

Hi,

i was reviewing my preset worksheet with the standard deviation function and
noticed that the out put result contains error.

Formula "=STDEV(K12:K111)" was set to to calculate standard deviation
between 0-100 sets of values in the column, where values can be whole numbers
or fractions, in respective columns. Formula was observed performing
perfectly with whole numbers, however, with fractions (or numbers with
decimal places), the output is invalid (in some sense).

You may test, with 60 sets of values "1" in a column, the STDEV is "0".
However, when applying the same formula to a fraction value "0.05", 60 sets
in a column, the STDEV is "4.9E-17". This is definitely a NO NO answer in
mathematics point of view. If you breakdown the calculation steps of a STDEV
formula "s2 = (∑(x-m)^2)/N" into multiple columns, it would be identified
that the culprit lies on the "∑(X-M)^2" formula, where x is the individual
value and m is the mean.

Example below demonstrates the breakdown of the column calculations:
x m N ∑(x-m)2 s2 s
0.05 0.05 1 0 0 0
0.05 0.05 2 0 0 0
0.05 0.05 3 1.44E-34 4.81E-35 6.94E-18
0.05 0.05 4 0 0 0
0.05 0.05 5 0 0 0
0.05 0.05 6 2.89E-34 4.81E-35 6.94E-18
0.05 0.05 7 3.37E-34 4.81E-35 6.94E-18
:
:
0.05 0.05 58 1.01E-31 1.73E-33 4.16E-17
0.05 0.05 59 1.02E-31 1.73E-33 4.16E-17
0.05 0.05 60 1.42E-31 2.36E-33 4.86E-17

*Note that the ∑(x-m)2 is calculated using array formula
"{=SUM(($A$2:A61-B61)^2)}", since mean is constantly changing.

Is this due to limitation of the excel formula (STDEV only works for whole
numbers)? Or is there any patch for this error?
 
Ad

Advertisements

G

Guest

Hi,
You may test, with 60 sets of values "1" in a column, the STDEV is "0".
However, when applying the same formula to a fraction value "0.05", 60 sets
i>n a column, the STDEV is "4.9E-17".

On my machine Excel calculates the standard deviation od 60 data points of
0.05 correctly as zero. It only appears as 4.9E-17 if the cell is formatted
as General. Format as number to get the correct answer.

Mike
 
J

JE McGimpsey

No, STDEV "works" for fractional as well as whole numbers, but you are
seeing the limitation of IEEE Double Precision Floating Point math,
which is used by XL and most other spreadsheets. There's no patch,
because XL can't determine whether the small rounding error is real or
an artifact.

Nearly all numbers cannot be exactly represented in a finite number of
binary digits, just as the number 1/3 cannot be represented in a finite
number of decimal digits. So you should expect small errors due to
rounding. You can use ROUND() to filter out these errors, e.g.,

=ROUND(STDEV(A2:A61),9)

See

http://cpearson.com/excel/rounding.aspx

for more details
 
J

JE McGimpsey

Changing format does nothing to the underlying value in the cell. Your
using the Number format simply is using the display engine to round the
value in the display.

It won't change the result, however:

B1: =STDEV(A2:A60)
B2: =B1=0

If you increase the number of displayed decimals, you'd find that you'd
display 0.000000000000000049
 
G

Guest

Hi Mike,

Thanks for the reply, i've also tested the setting. However, agreeable with
McGimpsey's reply (after yours), the result is still not zero. Is ok...accept
McGimpsey's sharing on the limitation...just have to use the rounding feature.

Thanks thought
 
G

Guest

Hi McGimpsey,

Thanks for the information shared, interesting to know one of the the
calculation backbone. Think will just have to use the ROUND function to close
the loop.

Thanks again.
 
Ad

Advertisements

G

Guest

While you are correct that this is a direct result of the limitations of IEEE
standard double precision (common to almost all general purpose software),
and the change in 2003 to a 2-pass algorithm for STDEV, VAR, etc is a welcome
numerical improvement, there is still room for improvement in Excel's
algorithms here. If Excel used updating algorithms
http://groups.google.com/group/microsoft.public.excel.misc/msg/4c6ee0c636ad016a
for AVERAGE, STDEV, COVAR, etc, then STDEV would always return zero for
constant data, GEOMEAN would never overflow unless individual observations
overflowed, ...

Jerry
 
Ad

Advertisements

J

JE McGimpsey

Thanks for the correction/amplification!

You're right - I should have said that the result is limited by the
combination of the inherent limits of finite precision representation
*and* the quality of implementation of the algorithm(s) involved.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top