Correlation coefficient (r)

P

Phil C

Hi folks,

This question is similar to Elizabeth Brown's, but with a slight twist. I am
also looking to calculate r ..from a exponential function plotted as a
log-lin scatter graph:

I posted last week (5 August) about a way of calculating the correlation
coefficient from a log-lin plot that is tolerant of deleted/mssing data, but
the proposed solution (CORREL array-entered) doesn't seem to work. Unless I
have overlooked something?

To restate, I have a Log-lin plot based on data in columns A and C. Column C
contains the y-values. Columns A and B just contain data (A1:A4 =
100,150,200,250; B1:B4 = 4900,3700,3000,2500]. Column C is derived from
column B, thus: C1 =IF(B1="",NA(),B1), C2 =IF(B2="",NA(),B2) etc.
BTW, have to use NA() rather than "" or the graph fails when a data point is
deleted.

The log lin x-y plot + exponential trendline + equation derived from columns
A and C all work fine. Deleting either an x value (from column A) or a y
value from column B causes the associated data point to disappear from the
graph and the trendline is sensibly recalculated, including the correlation
coefficient (r). Unfortunately, I need to use the r value for further
calculations and am trying to use the CORREL func: =CORREL(LN(C1:C4),A1:A4),
which fails (returns #NA) if I delete one of the y values (say C3). It is OK
if I delete one of the x-values. If the R squared value displayed as a
trendline option can 'survive' deleting a y value then it must surely be
possible using a standard function? I have already found out (via this NG)
that using SLOPE and INTERCEPT instead of INDEX(LOGEST..) will give the
slope and intercept values even when there is missing data. I now just need
r!

Thanks for your help..

Phil
 
J

Jerry W. Lewis

You will usually get better results to a follow-up question if you keep
it in the same thread. Mike Middleton is very good, and probably marked
that thread to see if you had further questions ...

=CORREL(LN(y_range),x_range)
should work just fine UNLESS you have missing data. Empty cells in
y_range would be interpreted as zero by LN. LN(0) returns #NUM! and
LN(text_values) returns #VALUE!, so either way Mike's formula would
return an error if you have missing values (which you didn't mention in
your original post). To avoid the errors in the LN() calculation,
expand the formula as follows:
=CORREL(IF(ISNUMBER(y_range),LN(y_range)),x_range)
which still requires array entry.

Jerry
 
P

Phil C

Jerry

Many thanks. It works! I was begining to give up on this one after pondering
it for the last week!
Point taken about keeping within the original thread, even though the
original post was over a week earlier.

Phil

PS. I have ordered Mike Middleton's book so should be better equipped to
deal with the next stats problem..


Jerry W. Lewis said:
You will usually get better results to a follow-up question if you keep
it in the same thread. Mike Middleton is very good, and probably marked
that thread to see if you had further questions ...

=CORREL(LN(y_range),x_range)
should work just fine UNLESS you have missing data. Empty cells in
y_range would be interpreted as zero by LN. LN(0) returns #NUM! and
LN(text_values) returns #VALUE!, so either way Mike's formula would
return an error if you have missing values (which you didn't mention in
your original post). To avoid the errors in the LN() calculation,
expand the formula as follows:
=CORREL(IF(ISNUMBER(y_range),LN(y_range)),x_range)
which still requires array entry.

Jerry

Phil said:
Hi folks,

This question is similar to Elizabeth Brown's, but with a slight twist. I am
also looking to calculate r ..from a exponential function plotted as a
log-lin scatter graph:

I posted last week (5 August) about a way of calculating the correlation
coefficient from a log-lin plot that is tolerant of deleted/mssing data, but
the proposed solution (CORREL array-entered) doesn't seem to work. Unless I
have overlooked something?

To restate, I have a Log-lin plot based on data in columns A and C. Column C
contains the y-values. Columns A and B just contain data (A1:A4 =
100,150,200,250; B1:B4 = 4900,3700,3000,2500]. Column C is derived from
column B, thus: C1 =IF(B1="",NA(),B1), C2 =IF(B2="",NA(),B2) etc.
BTW, have to use NA() rather than "" or the graph fails when a data point is
deleted.

The log lin x-y plot + exponential trendline + equation derived from columns
A and C all work fine. Deleting either an x value (from column A) or a y
value from column B causes the associated data point to disappear from the
graph and the trendline is sensibly recalculated, including the correlation
coefficient (r). Unfortunately, I need to use the r value for further
calculations and am trying to use the CORREL func: =CORREL(LN(C1:C4),A1:A4),
which fails (returns #NA) if I delete one of the y values (say C3). It is OK
if I delete one of the x-values. If the R squared value displayed as a
trendline option can 'survive' deleting a y value then it must surely be
possible using a standard function? I have already found out (via this NG)
that using SLOPE and INTERCEPT instead of INDEX(LOGEST..) will give the
slope and intercept values even when there is missing data. I now just need
r!

Thanks for your help..

Phil
 
J

Jerry W. Lewis

Glad it helped.

Jerry

Phil said:
Jerry

Many thanks. It works! I was begining to give up on this one after pondering
it for the last week!
Point taken about keeping within the original thread, even though the
original post was over a week earlier.

Phil

PS. I have ordered Mike Middleton's book so should be better equipped to
deal with the next stats problem..


You will usually get better results to a follow-up question if you keep
it in the same thread. Mike Middleton is very good, and probably marked
that thread to see if you had further questions ...

=CORREL(LN(y_range),x_range)
should work just fine UNLESS you have missing data. Empty cells in
y_range would be interpreted as zero by LN. LN(0) returns #NUM! and
LN(text_values) returns #VALUE!, so either way Mike's formula would
return an error if you have missing values (which you didn't mention in
your original post). To avoid the errors in the LN() calculation,
expand the formula as follows:
=CORREL(IF(ISNUMBER(y_range),LN(y_range)),x_range)
which still requires array entry.

Jerry

Phil C wrote:
I am
Unless I
Column C
contains the y-values. Columns A and B just contain data (A1:A4 =
100,150,200,250; B1:B4 = 4900,3700,3000,2500]. Column C is derived from
column B, thus: C1 =IF(B1="",NA(),B1), C2 =IF(B2="",NA(),B2) etc.
BTW, have to use NA() rather than "" or the graph fails when a data
point is
columns
the
correlation
=CORREL(LN(C1:C4),A1:A4),
is OK
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top