RSQ problem: zero denominator

H

Help Me

The RSQ() denominator is the product of two expressions, each
of the form: n*SUMPRODUCT({y}^2) - SUM({y})^2.

When each cell of {y} is the same value, the expression evaluates
to zero [1]. Consequently, the denominator of RSQ() is zero.

For some same-valued {y}, RSQ() returns zero. But for other
same-valued {y}, RSQ() returns a divide-by-zero error.

First, why does RSQ() behave differently for different same-valued
{y}?

I might guess that floating-point round-off (quantization) error
causes some expressions not to be zero. But when I compute the
expression in a cell, the result is zero for both of the same-valued
{y} that result in different behavior. I presume that rules out
floating-point round-off error as an explanation.

Second, how can I detect the function error and replace it with
zero, short of computing each component of the RSQ() demoninator
in an IF() statement?

I vaguely recall that I can use a custom format to accomplish
this. But I have not been able to figure out how. An explicit
example would be appreciated.

Third, if this is a known defect that has been fixed, can someone
tell me the version or patch id of Excel that fixes this?

I am currently using Excel 2002 (10.4302.4219) SP-2. I think
RSQ() should tolerate a zero denominator since I think it is not
unusual for each cell of {y} to have the same value (horizontal
line).
 
J

Jerry W. Lewis

If you were to disclose the constant y-value and number of occurrences
of that value that produces RSQ=0, I might be able to explain it more
exactly. RSQ should return #DIV/0! if all y's are the same. When all
y's are the same and RSQ returns 0, you are seeing numerical
difficulties that result in a nonzero STDEV({y}).

For Excel versions prior to 2003, RSQ, STDEV, etc. use numerically
unstable formulas. If all y's are identical, then SUMPRODUCT({y}^2) is
n*y^2 and SUM({y}) is n*y. Thus y portion of the denominator is
n*(n*y^2) - (n*y)^2
Mathematically, this is zero, but numerically there may be different
rounding for n*(n*y^2) vs. (n*y)^2

A numerically better way to calculate RSQ is =CORREL({y},{x})^2 since
the CORREL function uses a formula that is much less likely to suffer
from serious rounding problems.

To substitute 0 for the #DIV/0! error when all y's are identical, I
would use
=IF(DEVSQ({y})=0,0,CORREL({y},{x})^2)
You could alternately use
=IF(ISERROR(CORREL({y},{x})),0,CORREL({y},{x})^2)
but that would suppress any error, not just those caused by identical y's.

Jerry
 
H

Help Me

Jerry W. Lewis said:
If all y's are identical, then SUMPRODUCT({y}^2) is
n*y^2 and SUM({y}) is n*y. Thus y portion of the denominator is
n*(n*y^2) - (n*y)^2
Mathematically, this is zero, but numerically there may be different
rounding for n*(n*y^2) vs. (n*y)^2

I agree, as I indicated in my original posting.
But I thought I had ruled out that possibility
by computing n*SUMPROCDUCT(), SUM()^2, and their
difference in separate cells, which I displayed
with 16 decimal places. In both cases of {y},
the difference appeared to be zero.

However ....
If you were to disclose the constant y-value and number of occurrences
of that value that produces RSQ=0, I might be able to explain it more
exactly.

When I looked at what worked (RSQ is zero) and what
failed (RSQ is #DIV/0), I am convinced that
floating-point representation is indeed the correct
explanation. In one case, {y} contains 102.15; in
the other case, {y} contains 127.6875. Note that
127.6875 can be represented exactly in IEEE
floating-point and with very few bits of the mantissa,
whereas 102.15 cannot. Thus, as you say, I should not
be surprised that n*(n*y^2) is not the same numerical
result as (n*y)^2 in the latter case.

Nonetheless ....
RSQ should return #DIV/0! if all y's are the same.

I agree with that expectation mathematically. But
from a usability standpoint, I think that RSQ could
treat #DIV/0 as a special case, since that is the
expected result whenever one of the arrays represents
a horizontal or vertical line.

Moreover, the RSQ help text does not forewarn about
#DIV/0, whereas the CORREL help text does. That set
my expectation that RSQ handled the situation transparently.

But being "in the business" myself, I should know better
than to make inferences based on the omission of information.
To substitute 0 for the #DIV/0! error when all y's are identical, I
would use
=IF(DEVSQ({y})=0,0,CORREL({y},{x})^2)
You could alternately use
=IF(ISERROR(CORREL({y},{x})),0,CORREL({y},{x})^2)
but that would suppress any error, not just those caused by identical y's.

Thanks for the tips.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top