Excel error in R-squared computation in exponential regression

G

Guest

Excel computes the R-squared of an exponential regression wrongly: instead of
thecorrect result it reports the R-squared of the underlying linera
regression of the logarithm of the data. The two numbers are close if the
correlation is good, put far apart if it isn't. This Excel grossly
exaggerates the R-squared.

I am trying to bring this to the attention of Microsoft. It is of some
importance to me as I will be teaching this topic to my Math class in two
weeks.

So far I have not been able to wade through the layers of bureaucracy.
--
Bjoern Schellenberg

----------------
This post is a suggestion for Microsoft, and Microsoft responds to the
suggestions with the most votes. To vote for this suggestion, click the "I
Agree" button in the message pane. If you do not see the button, follow this
link to open the suggestion in the Microsoft Web-based Newsreader and then
click "I Agree" in the message pane.

http://www.microsoft.com/office/com...b-e8efcfaa282b&dg=microsoft.public.excel.misc
 
J

Jim Cone

There has been much discussion on the validity of MS Excel statistical functions.
MS has been beat up pretty badly over it.
MS even admits their R2 chart calculations are incorrect ...
http://support.microsoft.com/kb/829249
"You will receive an incorrect R-squared value in the chart tool in Excel 2003"

Here are a couple of papers that discuss some Excel statistical issues...
http://www.daheiser.info/excel/frontpage.html
http://en.wikibooks.org/wiki/Statistics:Numerical_Methods/Numerics_in_Excel
--
Jim Cone
San Francisco, USA
http://www.realezsites.com/bus/primitivesoftware
(Excel Add-ins / Excel Programming)



"Bjoern" <[email protected]>
wrote in message
Excel computes the R-squared of an exponential regression wrongly: instead of
thecorrect result it reports the R-squared of the underlying linera
regression of the logarithm of the data. The two numbers are close if the
correlation is good, put far apart if it isn't. This Excel grossly
exaggerates the R-squared.

I am trying to bring this to the attention of Microsoft. It is of some
importance to me as I will be teaching this topic to my Math class in two
weeks.

So far I have not been able to wade through the layers of bureaucracy.
--
Bjoern Schellenberg

----------------
This post is a suggestion for Microsoft, and Microsoft responds to the
suggestions with the most votes. To vote for this suggestion, click the "I
Agree" button in the message pane. If you do not see the button, follow this
link to open the suggestion in the Microsoft Web-based Newsreader and then
click "I Agree" in the message pane.

http://www.microsoft.com/office/com...b-e8efcfaa282b&dg=microsoft.public.excel.misc
 
M

Mike Middleton

Bjoern -
Excel computes the R-squared of an exponential regression wrongly ... <

Yes, but this isn't new news, and in my opinion it's unlikely that Microsoft
will ever fix it. And it does give you an important discussion topic for
your math class.

For the exponential fit, the equation is y = c*exp(b*x). Excel takes the
natural log of both sides to get ln(y) = ln(c)+b*x and does regression with
ln(y) as the dependent variable and x as the explanatory variable, i.e., it
finds ln(c) and b that minimizes the sum of squared deviations between
actual ln(y) and predicted ln(y). So the reported R-squared is the
proportion of variation in ln(y) that can be explained by x, instead of the
proportion of variation in y that is explained by x.

Similarly, for the power fit, the equation is y=c*x^b. Excel takes the
natural log to get ln(y) = ln(c)+b*ln(x) and does regression with ln(y) as
the dependent variable and ln(x) as the explanatory variable. So the
reported R-squared is the proportion of variation in ln(y) that can be
explained by ln(x).

For an extension of your discussions in the math class, you could use Solver
to search for c and b that minimizes the sum of squared deviations between
actual y and predicted y. And so on.

- Mike Middleton
http://www.DecisionToolworks.com
Decision Analysis Add-ins for Excel
 
G

Guest

An excellent reply indeed. I had planned on having students calculate th
R-squared the proper way (after obtaining the parameters as Excel does, but
then plugging them back into the exponential equation and calculating
r-squared as 1-SSR/SSY (where SSR is the sum of the square of the residuals
from regressin and SSY is the sum of the squares of the deviation from the
mean).

I stumbled upon this trying to use exponential regression of world
population as a function of time - clearly not a good model. Graphically it
is obvious that the model is bad, but the R-squared reported was 85%, so I
knew something was wrong. I then found exactly what you stated: Excel reports
the R-squared of the linear regression of the logs and does not bother to
"plug back in and recalculate".

What I fiind surprising is that I seem unable to get the attention of anyone
at Mircosoft to this rather serious flaw.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top