Excel beta - another bug

Harlan Grove · Jun 6, 2006

Someone else please confirm this.

A:A filled with the formula =ROW()^2.

C1:
=SUMPRODUCT(ROW(A:A)^2)

C2:
=SUM(A:A)

C3: [array formula]
=SUMSQ(ROW(A:A))

C1:C3 all appear to show the same value, 3.84307717958020E+17. And the
next 3 formulas all return true.

C4:
=C1=C2

C5:
=C2=C3

C6:
=C1=C3

But the following formulas aren't consistent with this.

C7:
=STDEV(C1:C3) returns 8589934592.0

C8:
=DEVSQ(C1:C3) returns 12288

FWIW, the exact result, per Mathematica, should be
3.843077179582709760E+17, so the XL12 result is affected by truncation
error at the 12th decimal digit. On the other hand, in XL11, with the
following formula entered into A1:AF32768

=((COLUMN()-1)*32768+ROW())^2

each of the following formulas return 3.84307717958226E+17

AH1:
=SUMPRODUCT(A1:AF32768)

AH2:
=SUM(A1:AF32768)

AH3: [array formula]
=SUMSQ((COLUMN(A:AF)-1)*32768+ROW(1:32768))

And both the following formulas return 0.

AH4:
=STDEV(AH1:AH3)

AH5:
=DEVSQ(AH1:AH3)

The XL11 result is closer to the actual value than the XL12 answer, so
it appears something has changed between XL11 and XL12, which is to be
expected since XL12 would need a new iterator to span the larger
worksheet grid. Unfortunately, it seems this new iterator degrades
XL12's numeric accuracy relative to XL11.

Can anyone confirm whether XL12 forces precision to 15 decimal digits
in all intermediate calculations? FWIW, the gawk script

BEGIN { for (k = 1; k <= 2^20; ++k) s += k; printf("%.15e\n", s) }

produces the same result as XL12, and the gawk script accumulates the
sum in the variable s, which only provides 64-bit double precision
instead of the FPU's 80-bit extended precision.

Dave Peterson · Jun 6, 2006

I used xl2003 (sp2) (XL11) and got 0 for C7:C8 and AH7:AH8 (I added the extra
checks).

C1:C8
9.3827139731456E+13
9.3827139731456E+13
9.3827139731456E+13
TRUE
TRUE
TRUE
0.0000000000000E+00
0.0000000000000E+00

AH1:AH8
3.8430771795823E+17
3.8430771795823E+17
3.8430771795823E+17
TRUE
TRUE
TRUE
0.0000000000000E+00
0.0000000000000E+00

Harlan said:
Someone else please confirm this.

A:A filled with the formula =ROW()^2.

C1:
=SUMPRODUCT(ROW(A:A)^2)

C2:
=SUM(A:A)

C3: [array formula]
=SUMSQ(ROW(A:A))

C1:C3 all appear to show the same value, 3.84307717958020E+17. And the
next 3 formulas all return true.

C4:
=C1=C2

C5:
=C2=C3

C6:
=C1=C3

But the following formulas aren't consistent with this.

C7:
=STDEV(C1:C3) returns 8589934592.0

C8:
=DEVSQ(C1:C3) returns 12288

FWIW, the exact result, per Mathematica, should be
3.843077179582709760E+17, so the XL12 result is affected by truncation
error at the 12th decimal digit. On the other hand, in XL11, with the
following formula entered into A1:AF32768

=((COLUMN()-1)*32768+ROW())^2

each of the following formulas return 3.84307717958226E+17

AH1:
=SUMPRODUCT(A1:AF32768)

AH2:
=SUM(A1:AF32768)

AH3: [array formula]
=SUMSQ((COLUMN(A:AF)-1)*32768+ROW(1:32768))

And both the following formulas return 0.

AH4:
=STDEV(AH1:AH3)

AH5:
=DEVSQ(AH1:AH3)

The XL11 result is closer to the actual value than the XL12 answer, so
it appears something has changed between XL11 and XL12, which is to be
expected since XL12 would need a new iterator to span the larger
worksheet grid. Unfortunately, it seems this new iterator degrades
XL12's numeric accuracy relative to XL11.

Can anyone confirm whether XL12 forces precision to 15 decimal digits
in all intermediate calculations? FWIW, the gawk script

BEGIN { for (k = 1; k <= 2^20; ++k) s += k; printf("%.15e\n", s) }

produces the same result as XL12, and the gawk script accumulates the
sum in the variable s, which only provides 64-bit double precision
instead of the FPU's 80-bit extended precision.

Dave Peterson · Jun 6, 2006

Ps. I don't own the beta.

Harlan said:
Someone else please confirm this.

A:A filled with the formula =ROW()^2.

C1:
=SUMPRODUCT(ROW(A:A)^2)

C2:
=SUM(A:A)

C3: [array formula]
=SUMSQ(ROW(A:A))

C1:C3 all appear to show the same value, 3.84307717958020E+17. And the
next 3 formulas all return true.

C4:
=C1=C2

C5:
=C2=C3

C6:
=C1=C3

But the following formulas aren't consistent with this.

C7:
=STDEV(C1:C3) returns 8589934592.0

C8:
=DEVSQ(C1:C3) returns 12288

FWIW, the exact result, per Mathematica, should be
3.843077179582709760E+17, so the XL12 result is affected by truncation
error at the 12th decimal digit. On the other hand, in XL11, with the
following formula entered into A1:AF32768

=((COLUMN()-1)*32768+ROW())^2

each of the following formulas return 3.84307717958226E+17

AH1:
=SUMPRODUCT(A1:AF32768)

AH2:
=SUM(A1:AF32768)

AH3: [array formula]
=SUMSQ((COLUMN(A:AF)-1)*32768+ROW(1:32768))

And both the following formulas return 0.

AH4:
=STDEV(AH1:AH3)

AH5:
=DEVSQ(AH1:AH3)

The XL11 result is closer to the actual value than the XL12 answer, so
it appears something has changed between XL11 and XL12, which is to be
expected since XL12 would need a new iterator to span the larger
worksheet grid. Unfortunately, it seems this new iterator degrades
XL12's numeric accuracy relative to XL11.

Can anyone confirm whether XL12 forces precision to 15 decimal digits
in all intermediate calculations? FWIW, the gawk script

BEGIN { for (k = 1; k <= 2^20; ++k) s += k; printf("%.15e\n", s) }

produces the same result as XL12, and the gawk script accumulates the
sum in the variable s, which only provides 64-bit double precision
instead of the FPU's 80-bit extended precision.

aaron.kempf · Jun 6, 2006

who cares excel is roadkill

http://news.com.com/Google+Spreadsheets+turns+up+heat+on+Excel/2100-1032_3-6080223.html?tag=nefd.top

Dave said:
Ps. I don't own the beta.

Harlan said:

Someone else please confirm this.

A:A filled with the formula =ROW()^2.

C1:
=SUMPRODUCT(ROW(A:A)^2)

C2:
=SUM(A:A)

C3: [array formula]
=SUMSQ(ROW(A:A))

C1:C3 all appear to show the same value, 3.84307717958020E+17. And the
next 3 formulas all return true.

C4:
=C1=C2

C5:
=C2=C3

C6:
=C1=C3

But the following formulas aren't consistent with this.

C7:
=STDEV(C1:C3) returns 8589934592.0

C8:
=DEVSQ(C1:C3) returns 12288

FWIW, the exact result, per Mathematica, should be
3.843077179582709760E+17, so the XL12 result is affected by truncation
error at the 12th decimal digit. On the other hand, in XL11, with the
following formula entered into A1:AF32768

=((COLUMN()-1)*32768+ROW())^2

each of the following formulas return 3.84307717958226E+17

AH1:
=SUMPRODUCT(A1:AF32768)

AH2:
=SUM(A1:AF32768)

AH3: [array formula]
=SUMSQ((COLUMN(A:AF)-1)*32768+ROW(1:32768))

And both the following formulas return 0.

AH4:
=STDEV(AH1:AH3)

AH5:
=DEVSQ(AH1:AH3)

The XL11 result is closer to the actual value than the XL12 answer, so
it appears something has changed between XL11 and XL12, which is to be
expected since XL12 would need a new iterator to span the larger
worksheet grid. Unfortunately, it seems this new iterator degrades
XL12's numeric accuracy relative to XL11.

Can anyone confirm whether XL12 forces precision to 15 decimal digits
in all intermediate calculations? FWIW, the gawk script

BEGIN { for (k = 1; k <= 2^20; ++k) s += k; printf("%.15e\n", s) }

produces the same result as XL12, and the gawk script accumulates the
sum in the variable s, which only provides 64-bit double precision
instead of the FPU's 80-bit extended precision.

Click to expand...

Harlan Grove · Jun 6, 2006

Dave Peterson wrote...

I used xl2003 (sp2) (XL11) and got 0 for C7:C8 and AH7:AH8 (I added the extra
checks).

C1:C8
9.3827139731456E+13
9.3827139731456E+13
9.3827139731456E+13
TRUE
TRUE
TRUE
0.0000000000000E+00
0.0000000000000E+00

AH1:AH8
3.8430771795823E+17
3.8430771795823E+17
3.8430771795823E+17
TRUE
TRUE
TRUE
0.0000000000000E+00
0.0000000000000E+00

The C1:C8 formulas are in XL12 and the AH1:AH8 formulas in XL11? The
C1:C3 values should be equal to the AH1:AH3 values to 11 decimal
places, but the values you show are off for XL12 by several orders of
magnitude. (And if your formulas are sums of integers that sum up to
less than 1E+15, they should all be exact, so unaffected by truncation,
so the C7 and C8 formulas should both be zero.)

Dave Peterson · Jun 6, 2006

All my testing was in excel 2003.

I did the same tests once more (adding to the two from before--there was enough
of a difference that I thought I screwed it up the first time).

All 3 times gave the same results.

Harlan Grove · Jun 6, 2006

Dave Peterson wrote...

All my testing was in excel 2003.

....

OK, you've confirmed my XL11 results. Anyone else want to take a stab
at XL12?

Peo Sjoblom · Jun 6, 2006

I got the same results as you using beta 2 of Excel 12

--

Regards,

Peo Sjoblom

Guest · Jun 6, 2006

=C1=C2
is a less reliable comparison than
=C1-C2
which has been subject to a fuzz factor since Excel 97
http://support.microsoft.com/kb/78113
The only reliable test of exact equality is whether
=(C1-C2) is exactly zero.

To see this, put 1 in C1 and =1+2^-B2 in C2.
B2 Result
53 all show equality (since the addition is beyond DP precision
52 only =(C1-C2) detects the difference
49 =C1-C2 begins to detect the difference
47 =C1=C2 begins to detect the difference

The preceding results have been verified in 2000 and 2003.

Jerry

Harlan Grove · Jun 6, 2006

Jerry W. Lewis wrote...

=C1=C2
is a less reliable comparison than
=C1-C2
which has been subject to a fuzz factor since Excel 97
http://support.microsoft.com/kb/78113
The only reliable test of exact equality is whether
=(C1-C2) is exactly zero.

....

I accept this, but my concern is that XL12 beta shows nonzero results
for STDEV and DEVSQ of 3 different calculations of what mathematically
are the same thing, the sum of squares of the integers from 1 to 2^20
given by SUMPRODUCT of an array raised to the 2nd power, SUM of a range
of what should be the same values, and SUMSQ of the same unraised array
as used for SUMPRODUCT. Those calculations give *different* results
than XL11, but the same result as a simple gawk script in which all
intermediate calculations are stored in 64-bit double precision. My
concern is that XL11 and prior seem to have taked advantage of 80-bit
extended precision FPU registers for intermediate calculations inside
built-in function calls, but XL12 beta seems to behave in this
particular case like the gawk script, i.e., as if all intermediate
calculations are only in double precision.

Guest · Jun 7, 2006

As best as I can determine, the only difference in these calculations in 2007
vs earlier versions is the number of available rows. Everything else follows
from order of evaluation.

For A1:A1048576, the data is in a single column, so all three formulas sum
from smallest to largest, which generally minimizes truncation error. The
resulting sum is 384307717958019968 = (93825126454594*2^6+46)*2^6. All
versions of Excel give the reported DEVSQ when applied to (x,x,x) with x
equal to this value. Surprisingly, STDEV also gives the reported result in
all versions, despite the algorithm change in 2003.

For A1:AF32768, the formulas sum across each row in turn, which has
different truncation properties. The resulting sum (in either 2007 beta or
earlier versions) is 384307717958225920 = 93825126454645*2^12. All versions
of Excel give zero for DEVSQ and STDEV when applied to (x,x,x) with x equal
to this value.

The exact (unlimited precision) result of the summation is (2*n+1)*(n+1)*n/6
= 384307717958270976 = 733008800427*2^19 which is exactly representable in
IEEE double precision, although intermediates are not. If intermediate
partial sums were retained in 10 byte registers, then this value should be
the result of the summation. Therefore it appears that no version of Excel
has ever retained intermediate partial sums in 10-byte registers for SUM,
SUMPRODUCT, or SUMSQ.

The issue that the current algorithm for AVERAGE (and therefore DEVSQ,
STDEV, etc) need not be exact just because all values are equal, has been
discussed in previous threads that we have participated in. If AVERAGE (and
therefore DEVSQ) used 10-byte register storage of intermediates, then
AVERAGE(x,x,x) would always equal x, and hence DEVSQ(x,x,x) would always
equal zero. Therefore it appears that no version of Excel has ever retained
intermediates in 10-byte registers for AVERAGE or DEVSQ.

The increased worksheet size raises the potential for these anomalies to
crop up more often. IMHO the larger worksheet size strengthens the need for
MS to either store intermediates in 10-byte registers or else switch to
updating algorithms for these calculations, but they have not done either.
Interestingly, version 2.00 of Stephen Byeâ€™s Spread32 program
http://www.byedesign.freeserve.co.uk/
has switched to updating algorithms and so returns zero for DEVSQ(x,x,x)
and STDEV(x,x,x) for with any value of x. He also has probability functions
that approach the accuracy and working range of Ian Smithâ€™s VBA library, so
Spread32 now sets the standard for accuracy of native worksheet statistical
functions.

Jerry

Guest · Jun 7, 2006

And to complete the evidence, note that with
=(COLUMN()+32*(ROW()-1))^2
in A1:AF32768 and
=SUMSQ(COLUMN(A:AF)+32*(ROW(1:32768)-1))
array entered in AH3, you force the same order of evaluation in earlier
Excel versions and therefore get the same results as from single column
evaluation in 2007 beta.

Jerry

Harlan Grove · Jun 7, 2006

Jerry W. Lewis wrote...

As best as I can determine, the only difference in these calculations in 2007
vs earlier versions is the number of available rows. Everything else follows
from order of evaluation.

....

You're right. I hadn't considered rowwise vs columnwise evaluation
order. When I used the same formulas in XL11 and XL12, they produced
the same results.

For A1:A1048576, the data is in a single column, so all three formulas sum
from smallest to largest, which generally minimizes truncation error. The
resulting sum is 384307717958019968 = (93825126454594*2^6+46)*2^6. All
versions of Excel give the reported DEVSQ when applied to (x,x,x) with x
equal to this value. Surprisingly, STDEV also gives the reported result in
all versions, despite the algorithm change in 2003.

....

I suppose the question is how the 3 formula could product different
results if they all sum the same squares. I just tested the STDEV and
DEVSQ formulas in both XL11 and XL12 by replacing the SUM and SUMSQ
formula with simple formula references to the SUMPRODUCT formula, so

AH1:
=SUMPRODUCT(...)

AH2:
=A1

AH3:
=AH1

and the STDEV and DEVSQ formulas return the same NONZERO results. So
it's true this isn't a NEW bug in XL12, but it's an esoteric bug that
HAS ESISTED in previous versions and REMAINS in XL12.

The exact (unlimited precision) result of the summation is (2*n+1)*(n+1)*n/6
= 384307717958270976 = 733008800427*2^19 which is exactly representable in
IEEE double precision, although intermediates are not. If intermediate
partial sums were retained in 10 byte registers, then this value should be
the result of the summation. Therefore it appears that no version of Excel
has ever retained intermediate partial sums in 10-byte registers for SUM,
SUMPRODUCT, or SUMSQ.

The issue that the current algorithm for AVERAGE (and therefore DEVSQ,
STDEV, etc) need not be exact just because all values are equal, has been
discussed in previous threads that we have participated in. If AVERAGE (and
therefore DEVSQ) used 10-byte register storage of intermediates, then
AVERAGE(x,x,x) would always equal x, and hence DEVSQ(x,x,x) would always
equal zero. Therefore it appears that no version of Excel has ever retained
intermediates in 10-byte registers for AVERAGE or DEVSQ.

....

OK. Thanks for correcting me.

However, the average of 3 very large equal values calculated by summing
all 3 then dividing by 3 should lose at most 1 bit of precision. The
DEVSQ result appears consistent with this, but the STDEV result
doesn't. The STDEV result could be the result of a poor updating
algorithm.

Guest · Jun 7, 2006

:
....

However, the average of 3 very large equal values calculated by summing
all 3 then dividing by 3 should lose at most 1 bit of precision. The
DEVSQ result appears consistent with this, but the STDEV result
doesn't. The STDEV result could be the result of a poor updating
algorithm.

Now there's a $64,000 question! While the new (2003) STDEV algorithm
usually behaves like =SQRT(DEVSQ(data)/(COUNT(data)-1)), here with extremely
challenging numerical data, it does not. Why not? Why does MS continue to
re-invent the wheel when they don't have to?

Jerry

Excel beta - another bug

Harlan Grove

Dave Peterson

Dave Peterson

aaron.kempf

Harlan Grove

Dave Peterson

Harlan Grove

Peo Sjoblom

Guest

Harlan Grove

Guest

Guest

Harlan Grove

Guest