# Correlation Coefficient Issue

D

#### Diane

I have a formula in cell C7, the formula is:
CORREL(B4:B6,C4:C6)

The cells have the follwing values:
B4: 1.90
B5: 1.90
B6: 1.90
C4: 2.13
C5: 1.83
C6: 3.02

Cell C7 returns: 0.000

Here's where the problem comes in.
The formula in cell C15 is: CORREL(B12:B14,C12:C14)
The cells have the follwing values:
B12: 9.50
B13: 9.50
B14: 9.50
C12: 9.56
C13: 10.26
C13: 10.16

Cell C15 returns the error: #DIV/0!

Why is the formula producing the #DIV/0! error in the
second set and not the first or vice versa?

ANY help would be greatly appreciated.
Thanks, Diane

J

#### Jerry W. Lewis

See Help for CORREL worksheet function. The denominator is the product
of standard deviations for x and y. The standard deviation of B12:B14
is zero ...

Format B4:B6 to show 14 decimal places. You will find that they are not
all equal, hence the standard deviation is small, but not zero.

Jerry

D

#### Diane

If I increase B4:B6 to show 14 decimal places, the values
will only be 1.90000000000000 - Why would that make a
difference?

J

#### Jerry W. Lewis

I presume then that B4:B6 are calculated results. Because of decimal/
binary conversions, etc. it is possible to have slight differences
between calculated results that cannot be displayed. If you will post
(text only, please, no attachments) the formulas and the contents of
the referenced cells, I will endeavor to explain why these slight
differences are not arithmetic errors. Meanwhile, you will find that
at least one of the following formulae are not zero
=(B4-B5)
=(B4-B6)
=(B5-B6)
The parentheses in these formulas are required to see the differences
when the values are equal to 15 decimal places.

In cases such as this, rounding the results in B4:B6 is the only way to
get the expected result.

Jerry

J

#### Jerry W. Lewis

As a possibly useful preview, enter 1.9 into A15
In A14 enter the formula
=A15+2^-51
In A15 enter the formula
=A15-2^-51
Copy cell A14 and paste over A1:A13
Copy cell A16 and paste over A17:A30
Format A1:A16 as numeric with 15 decimal places

This illustrates that (as documented) Excel will only display 15 digits
(14 decimal places with these numbers), yet there are 23 distinct binary
numbers that all display as 1.90000000000000

In your case your cell formulas resulted in at least two of these 23
distinct values, hence the resulting nonzero correlation.

Jerry

D

#### Diane

Thanks for your help - Below are the values:

The following are manual entries:
B4: 1.90
B5: 1.90
B6: 1.90

The following are manual entries:
C4: 2.13
C5: 1.83
C6: 3.02

Formula in C7:
=CORREL(B4:B6,C4:C6)
The above returns 0.000

The following are manual entries:
B12: 9.50
B13: 9.50
B14: 9.50

The following are manual entries:
C12: 9.56
C13: 10.26
C14: 10.16

Formula in C15:
=CORREL(B12:B14,C12:C14)
The above returns #DIV/0!

Thank you again for your help

H

#### hrlngrv - ExcelForums.com

Diane wrote..
Thanks for your help - Below are the values
The following are manual entries
B4: 1.9
B5: 1.9
B6: 1.9
The following are manual entries
C4: 2.1
C5: 1.8
C6: 3.0
Formula in C7
=CORREL(B4:B6,C4:C6
The above returns 0.00
..

This is an error. FWIW, STDEV(B4:B6) returns 2.98023E-08 in XL97SR-2
but that's wrong because the standard deviation of a set of an
number of instances of the same number should be zero

This appears to be nothing more than one of the nastier instances o
Excel's poor numerical implementation in older versions

However, in all instances, if B4:B6 has zero variance, so zer
standard deviation, then it's correlation with any other set o
values is undefined. Putting this another way, there's no such thin
as correlation between constants and random variables or betwee
different constants

J

#### Jerry W. Lewis

Per Help for CORREL and DEVSQ,
=CORREL(B4:B6,C4:C6)
is equivalent to
=COVAR(B4:B6,C4:C6) / ( SQRT(DEVSQ(B4:B6)/COUNT(B4:B6)) *
SQRT(DEVSQ(C4:C6)/COUNT(C4:C6)) )

=DEVSQ(B12:B14) is zero but =DEVSQ(B4:B6) is not zero, hence the
difference in CORREL results.

=DEVSQ(B4:B6) is not zero because =(B4-AVERAGE(B4:B6)) is not zero.

That can happen when all of the values are identical, because computer
math is done to finite precision. Imagine that you had a decimal
computer that carried only 4 figures in calculations. In that case,
SUM(0.4444,0.4444,0.4444) would be 1.333 since 1.3332 would require a
5th figure. Thus AVERAGE(0.4444,0.4444,0.4444) would be 0.4443 instead
of 0.4444 and hence (0.4444-AVERAGE(0.4444,0.4444,0.4444)) would not be
zero.

You are probably wondering how this is applicable, when your repeated
number is 1.9 since Excel advertises 15 digit precision. The reason is
that Excel (and almost all other general purpose hardware and software)
does binary math, not decimal math. Most decimal fractions (including
1.9) is a non-terminating binary fraction, just as 1/3 is a
non-terminating decimal fraction. The trailing bit pattern in the
binary approximation to 1.9 is such that there is lost information when
you add 3 copies of 1.9 (crossing 2 powers of 2) just as in my
hypothetical example. If you chase through the details of the IEEE 754
standard, you will find that =(B4-AVERAGE(B4:B6)) should equal
1/4503599627370496 which is 2.220446049250313080847263336181640625E-16.
Excel displays that difference as 2.22044604925031E-16, indicating that
it is performing the arithmetic correctly.

9.5 is exactly representable in binary (an integer plus a power of 2),
hence finite precision arithmetic matches the infinite precision
arithmetic that you would mentally do.

Jerry

J