PC Review


Reply
Thread Tools Rate Thread

Correlation coefficient (r)

 
 
Phil C
Guest
Posts: n/a
 
      11th Aug 2004
Hi folks,

This question is similar to Elizabeth Brown's, but with a slight twist. I am
also looking to calculate r ..from a exponential function plotted as a
log-lin scatter graph:

I posted last week (5 August) about a way of calculating the correlation
coefficient from a log-lin plot that is tolerant of deleted/mssing data, but
the proposed solution (CORREL array-entered) doesn't seem to work. Unless I
have overlooked something?

To restate, I have a Log-lin plot based on data in columns A and C. Column C
contains the y-values. Columns A and B just contain data (A1:A4 =
100,150,200,250; B1:B4 = 4900,3700,3000,2500]. Column C is derived from
column B, thus: C1 =IF(B1="",NA(),B1), C2 =IF(B2="",NA(),B2) etc.
BTW, have to use NA() rather than "" or the graph fails when a data point is
deleted.

The log lin x-y plot + exponential trendline + equation derived from columns
A and C all work fine. Deleting either an x value (from column A) or a y
value from column B causes the associated data point to disappear from the
graph and the trendline is sensibly recalculated, including the correlation
coefficient (r). Unfortunately, I need to use the r value for further
calculations and am trying to use the CORREL func: =CORREL(LN(C1:C4),A1:A4),
which fails (returns #NA) if I delete one of the y values (say C3). It is OK
if I delete one of the x-values. If the R squared value displayed as a
trendline option can 'survive' deleting a y value then it must surely be
possible using a standard function? I have already found out (via this NG)
that using SLOPE and INTERCEPT instead of INDEX(LOGEST..) will give the
slope and intercept values even when there is missing data. I now just need
r!

Thanks for your help..

Phil



 
Reply With Quote
 
 
 
 
Jerry W. Lewis
Guest
Posts: n/a
 
      13th Aug 2004
You will usually get better results to a follow-up question if you keep
it in the same thread. Mike Middleton is very good, and probably marked
that thread to see if you had further questions ...

=CORREL(LN(y_range),x_range)
should work just fine UNLESS you have missing data. Empty cells in
y_range would be interpreted as zero by LN. LN(0) returns #NUM! and
LN(text_values) returns #VALUE!, so either way Mike's formula would
return an error if you have missing values (which you didn't mention in
your original post). To avoid the errors in the LN() calculation,
expand the formula as follows:
=CORREL(IF(ISNUMBER(y_range),LN(y_range)),x_range)
which still requires array entry.

Jerry

Phil C wrote:

> Hi folks,
>
> This question is similar to Elizabeth Brown's, but with a slight twist. I am
> also looking to calculate r ..from a exponential function plotted as a
> log-lin scatter graph:
>
> I posted last week (5 August) about a way of calculating the correlation
> coefficient from a log-lin plot that is tolerant of deleted/mssing data, but
> the proposed solution (CORREL array-entered) doesn't seem to work. Unless I
> have overlooked something?
>
> To restate, I have a Log-lin plot based on data in columns A and C. Column C
> contains the y-values. Columns A and B just contain data (A1:A4 =
> 100,150,200,250; B1:B4 = 4900,3700,3000,2500]. Column C is derived from
> column B, thus: C1 =IF(B1="",NA(),B1), C2 =IF(B2="",NA(),B2) etc.
> BTW, have to use NA() rather than "" or the graph fails when a data point is
> deleted.
>
> The log lin x-y plot + exponential trendline + equation derived from columns
> A and C all work fine. Deleting either an x value (from column A) or a y
> value from column B causes the associated data point to disappear from the
> graph and the trendline is sensibly recalculated, including the correlation
> coefficient (r). Unfortunately, I need to use the r value for further
> calculations and am trying to use the CORREL func: =CORREL(LN(C1:C4),A1:A4),
> which fails (returns #NA) if I delete one of the y values (say C3). It is OK
> if I delete one of the x-values. If the R squared value displayed as a
> trendline option can 'survive' deleting a y value then it must surely be
> possible using a standard function? I have already found out (via this NG)
> that using SLOPE and INTERCEPT instead of INDEX(LOGEST..) will give the
> slope and intercept values even when there is missing data. I now just need
> r!
>
> Thanks for your help..
>
> Phil
>
>
>
>


 
Reply With Quote
 
Phil C
Guest
Posts: n/a
 
      13th Aug 2004
Jerry

Many thanks. It works! I was begining to give up on this one after pondering
it for the last week!
Point taken about keeping within the original thread, even though the
original post was over a week earlier.

Phil

PS. I have ordered Mike Middleton's book so should be better equipped to
deal with the next stats problem..


"Jerry W. Lewis" <post_a_reply@no_e-mail.com> wrote in message
news:411C9837.1000509@no_e-mail.com...
> You will usually get better results to a follow-up question if you keep
> it in the same thread. Mike Middleton is very good, and probably marked
> that thread to see if you had further questions ...
>
> =CORREL(LN(y_range),x_range)
> should work just fine UNLESS you have missing data. Empty cells in
> y_range would be interpreted as zero by LN. LN(0) returns #NUM! and
> LN(text_values) returns #VALUE!, so either way Mike's formula would
> return an error if you have missing values (which you didn't mention in
> your original post). To avoid the errors in the LN() calculation,
> expand the formula as follows:
> =CORREL(IF(ISNUMBER(y_range),LN(y_range)),x_range)
> which still requires array entry.
>
> Jerry
>
> Phil C wrote:
>
> > Hi folks,
> >
> > This question is similar to Elizabeth Brown's, but with a slight twist.

I am
> > also looking to calculate r ..from a exponential function plotted as a
> > log-lin scatter graph:
> >
> > I posted last week (5 August) about a way of calculating the correlation
> > coefficient from a log-lin plot that is tolerant of deleted/mssing data,

but
> > the proposed solution (CORREL array-entered) doesn't seem to work.

Unless I
> > have overlooked something?
> >
> > To restate, I have a Log-lin plot based on data in columns A and C.

Column C
> > contains the y-values. Columns A and B just contain data (A1:A4 =
> > 100,150,200,250; B1:B4 = 4900,3700,3000,2500]. Column C is derived from
> > column B, thus: C1 =IF(B1="",NA(),B1), C2 =IF(B2="",NA(),B2) etc.
> > BTW, have to use NA() rather than "" or the graph fails when a data

point is
> > deleted.
> >
> > The log lin x-y plot + exponential trendline + equation derived from

columns
> > A and C all work fine. Deleting either an x value (from column A) or a y
> > value from column B causes the associated data point to disappear from

the
> > graph and the trendline is sensibly recalculated, including the

correlation
> > coefficient (r). Unfortunately, I need to use the r value for further
> > calculations and am trying to use the CORREL func:

=CORREL(LN(C1:C4),A1:A4),
> > which fails (returns #NA) if I delete one of the y values (say C3). It

is OK
> > if I delete one of the x-values. If the R squared value displayed as a
> > trendline option can 'survive' deleting a y value then it must surely be
> > possible using a standard function? I have already found out (via this

NG)
> > that using SLOPE and INTERCEPT instead of INDEX(LOGEST..) will give the
> > slope and intercept values even when there is missing data. I now just

need
> > r!
> >
> > Thanks for your help..
> >
> > Phil
> >
> >
> >
> >

>



 
Reply With Quote
 
Jerry W. Lewis
Guest
Posts: n/a
 
      14th Aug 2004
Glad it helped.

Jerry

Phil C wrote:

> Jerry
>
> Many thanks. It works! I was begining to give up on this one after pondering
> it for the last week!
> Point taken about keeping within the original thread, even though the
> original post was over a week earlier.
>
> Phil
>
> PS. I have ordered Mike Middleton's book so should be better equipped to
> deal with the next stats problem..
>
>
> "Jerry W. Lewis" <post_a_reply@no_e-mail.com> wrote in message
> news:411C9837.1000509@no_e-mail.com...
>
>>You will usually get better results to a follow-up question if you keep
>>it in the same thread. Mike Middleton is very good, and probably marked
>>that thread to see if you had further questions ...
>>
>> =CORREL(LN(y_range),x_range)
>>should work just fine UNLESS you have missing data. Empty cells in
>>y_range would be interpreted as zero by LN. LN(0) returns #NUM! and
>>LN(text_values) returns #VALUE!, so either way Mike's formula would
>>return an error if you have missing values (which you didn't mention in
>>your original post). To avoid the errors in the LN() calculation,
>>expand the formula as follows:
>> =CORREL(IF(ISNUMBER(y_range),LN(y_range)),x_range)
>>which still requires array entry.
>>
>>Jerry
>>
>>Phil C wrote:
>>
>>
>>>Hi folks,
>>>
>>>This question is similar to Elizabeth Brown's, but with a slight twist.
>>>

> I am
>
>>>also looking to calculate r ..from a exponential function plotted as a
>>>log-lin scatter graph:
>>>
>>>I posted last week (5 August) about a way of calculating the correlation
>>>coefficient from a log-lin plot that is tolerant of deleted/mssing data,
>>>

> but
>
>>>the proposed solution (CORREL array-entered) doesn't seem to work.
>>>

> Unless I
>
>>>have overlooked something?
>>>
>>>To restate, I have a Log-lin plot based on data in columns A and C.
>>>

> Column C
>
>>>contains the y-values. Columns A and B just contain data (A1:A4 =
>>>100,150,200,250; B1:B4 = 4900,3700,3000,2500]. Column C is derived from
>>>column B, thus: C1 =IF(B1="",NA(),B1), C2 =IF(B2="",NA(),B2) etc.
>>>BTW, have to use NA() rather than "" or the graph fails when a data
>>>

> point is
>
>>>deleted.
>>>
>>>The log lin x-y plot + exponential trendline + equation derived from
>>>

> columns
>
>>>A and C all work fine. Deleting either an x value (from column A) or a y
>>>value from column B causes the associated data point to disappear from
>>>

> the
>
>>>graph and the trendline is sensibly recalculated, including the
>>>

> correlation
>
>>>coefficient (r). Unfortunately, I need to use the r value for further
>>>calculations and am trying to use the CORREL func:
>>>

> =CORREL(LN(C1:C4),A1:A4),
>
>>>which fails (returns #NA) if I delete one of the y values (say C3). It
>>>

> is OK
>
>>>if I delete one of the x-values. If the R squared value displayed as a
>>>trendline option can 'survive' deleting a y value then it must surely be
>>>possible using a standard function? I have already found out (via this
>>>

> NG)
>
>>>that using SLOPE and INTERCEPT instead of INDEX(LOGEST..) will give the
>>>slope and intercept values even when there is missing data. I now just
>>>

> need
>
>>>r!
>>>
>>>Thanks for your help..
>>>
>>>Phil


 
Reply With Quote
 
 
 
Reply

Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Correlation-Coefficient daniel chen Microsoft Excel Misc 3 31st Jan 2006 07:59 AM
spearman correlation coefficient =?Utf-8?B?TWFyY2VsIExhYm9udMOp?= Microsoft Excel Misc 1 12th May 2005 04:00 AM
Correlation Coefficient Issue Diane Microsoft Excel Worksheet Functions 8 17th Dec 2004 03:48 AM
Correlation coefficient for log plot.. Phil C Microsoft Excel Charting 2 6th Aug 2004 08:34 AM
correlation coefficient when I have totals for each data pair =?Utf-8?B?ZGFu?= Microsoft Excel Misc 1 15th Mar 2004 05:07 AM


Features
 

Advertising
 

Newsgroups
 


All times are GMT +1. The time now is 11:34 PM.