PC Review


Reply
Thread Tools Rate Thread

Correlation coefficient for log plot..

 
 
Phil C
Guest
Posts: n/a
 
      5th Aug 2004
Hi All

I posted on 3 August ("Growth function falls over..") about LOGEST's
inability to handle missing data.
My data [x=120, 150, 200, 235; y=100, 75.5, 61.2, 50.1) plots on a log/lin
plot (x-y scattergraph).

Jerry Lewis came up with a neat solution using SLOPE and INTERCEPT to
replace =INDEX(LOGEST, etc..), which works fine, but I now have a similar
problem with the correlation coefficient, r. The only way I have found for
(presumed) accurate calculation of r is to use LOGEST entered as an array
formula to get the extra stats, including r squared. This agrees exactly
with the value that you can display (Format trendline|options) along with
the (exponential) trendline (r squared = 0.9754), but I need an alternative
to LOGEST! The CORREL function seems to get the wrong answer for my data
(r=-0.9688, so r squared = 0.9405) . but seems to work OK for ordinary
linear data. The discepancy between LOGEST and CORREL varies as you play
with the data but the difference is always significant. Do a long-hand
formula for 'r'?

Any ideas?

Thanks, Phil



 
Reply With Quote
 
 
 
 
Michael R Middleton
Guest
Posts: n/a
 
      6th Aug 2004
Phil -

> I posted on 3 August ("Growth function falls over..") about LOGEST's

inability to handle missing data. My data [x=120, 150, 200, 235; y=100,
75.5, 61.2, 50.1) plots on a log/lin plot (x-y scattergraph).
>
> Jerry Lewis came up with a neat solution using SLOPE and INTERCEPT to

replace =INDEX(LOGEST, etc..), which works fine, but I now have a similar
problem with the correlation coefficient, r. The only way I have found for
(presumed) accurate calculation of r is to use LOGEST entered as an array
formula to get the extra stats, including r squared. This agrees exactly
with the value that you can display (Format trendline|options) along with
the (exponential) trendline (r squared = 0.9754), but I need an alternative
to LOGEST! The CORREL function seems to get the wrong answer for my data
(r=-0.9688, so r squared = 0.9405) . but seems to work OK for ordinary
linear data. The discepancy between LOGEST and CORREL varies as you play
with the data but the difference is always significant. Do a long-hand
formula for 'r'?
>
> Any ideas?


Use array-entered CORREL(LN(y-range),x-range).

- Mike

www.mikemiddleton.com

P.S. Here are some relevant excerpts from my book "Data Analysis Using
Microsoft Excel: Updated for Office XP":

The exponential model creates a trendline using the equation

y = c*(e^(b*x)).

Excel uses a log transformation of the original y data to determine fitted
values, so the values of the dependent variable in your data set must be
positive. If any y values are zero or negative, the Exponential icon on the
Add Trendline Type tab will be grayed out. (As a workaround, you can add a
constant to each y value.)

The exponential trendline feature does not find values of b and c that
minimize the sum of squared deviations between actual y and predicted y
(=c*(e^(b*x))). Instead, Excel's method takes the logarithm of both sides of
the exponential formula, which then can be written as

Ln(y) = Ln(c) + b * x

and uses standard linear regression with Ln(y) as the dependent variable and
x as the explanatory variable. That is, Excel finds the intercept and slope
that minimize the sum of squared deviations between actual Ln(y) and
predicted Ln(y), using the formula

Ln(y) = Intercept + Slope * x.

Therefore, the Intercept value corresponds to Ln(c), and c in the
exponential formula is equal to Exp(Intercept). The Slope value corresponds
to b in the exponential formula.




 
Reply With Quote
 
Phil C
Guest
Posts: n/a
 
      6th Aug 2004
Mike

Many thanks. I may even buy the book!

Phil


"Michael R Middleton" <(E-Mail Removed)> wrote in message
news:#(E-Mail Removed)...
> Phil -
>
> > I posted on 3 August ("Growth function falls over..") about LOGEST's

> inability to handle missing data. My data [x=120, 150, 200, 235; y=100,
> 75.5, 61.2, 50.1) plots on a log/lin plot (x-y scattergraph).
> >
> > Jerry Lewis came up with a neat solution using SLOPE and INTERCEPT to

> replace =INDEX(LOGEST, etc..), which works fine, but I now have a similar
> problem with the correlation coefficient, r. The only way I have found for
> (presumed) accurate calculation of r is to use LOGEST entered as an array
> formula to get the extra stats, including r squared. This agrees exactly
> with the value that you can display (Format trendline|options) along with
> the (exponential) trendline (r squared = 0.9754), but I need an

alternative
> to LOGEST! The CORREL function seems to get the wrong answer for my data
> (r=-0.9688, so r squared = 0.9405) . but seems to work OK for ordinary
> linear data. The discepancy between LOGEST and CORREL varies as you play
> with the data but the difference is always significant. Do a long-hand
> formula for 'r'?
> >
> > Any ideas?

>
> Use array-entered CORREL(LN(y-range),x-range).
>
> - Mike
>
> www.mikemiddleton.com
>
> P.S. Here are some relevant excerpts from my book "Data Analysis Using
> Microsoft Excel: Updated for Office XP":
>
> The exponential model creates a trendline using the equation
>
> y = c*(e^(b*x)).
>
> Excel uses a log transformation of the original y data to determine fitted
> values, so the values of the dependent variable in your data set must be
> positive. If any y values are zero or negative, the Exponential icon on

the
> Add Trendline Type tab will be grayed out. (As a workaround, you can add a
> constant to each y value.)
>
> The exponential trendline feature does not find values of b and c that
> minimize the sum of squared deviations between actual y and predicted y
> (=c*(e^(b*x))). Instead, Excel's method takes the logarithm of both sides

of
> the exponential formula, which then can be written as
>
> Ln(y) = Ln(c) + b * x
>
> and uses standard linear regression with Ln(y) as the dependent variable

and
> x as the explanatory variable. That is, Excel finds the intercept and

slope
> that minimize the sum of squared deviations between actual Ln(y) and
> predicted Ln(y), using the formula
>
> Ln(y) = Intercept + Slope * x.
>
> Therefore, the Intercept value corresponds to Ln(c), and c in the
> exponential formula is equal to Exp(Intercept). The Slope value

corresponds
> to b in the exponential formula.
>
>
>
>



 
Reply With Quote
 
 
 
Reply

Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Correlation-Coefficient daniel chen Microsoft Excel Misc 3 31st Jan 2006 07:59 AM
spearman correlation coefficient =?Utf-8?B?TWFyY2VsIExhYm9udMOp?= Microsoft Excel Misc 1 12th May 2005 04:00 AM
Correlation Coefficient Issue Diane Microsoft Excel Worksheet Functions 8 17th Dec 2004 03:48 AM
Correlation coefficient (r) Phil C Microsoft Excel Charting 3 14th Aug 2004 04:27 AM
correlation coefficient when I have totals for each data pair =?Utf-8?B?ZGFu?= Microsoft Excel Misc 1 15th Mar 2004 05:07 AM


Features
 

Advertising
 

Newsgroups
 


All times are GMT +1. The time now is 11:34 PM.