Trendline equation in chart is wrong

I

Iskandar Taib

I'm having trouble plugging the equation from the trendline of a chart
back into the Worksheet to get calculated interpolated values. In
essence, I added a 4th order polynomial regression line to an X-Y
chart, and then used the displayed equation in the worksheet, working
off the X values to get the Y values for the regression curve. It
doesn't work. The calculated values are way, way off (several orders
of magnitude). I tried it again, lopping off 977 from the plotted Y
values, in case dealing with large numbers was causing problems.
Again, no joy. I came across an earlier thread describing this same
problem, and it was suggested that the number of decimal places in the
displayed equation be increased. I did that, no change.

Can someone tell me a way out of this? I don't think I made any
mistakes when I copied the formula over, but you never know. The
worksheet in question (I've trimmed it down to the bare essentials,
yet it's 200+ KBytes) is here, if you'd like a look:

http://geology.um.edu.my/iskandar/misc/regression-problem.xls

I've scanned it for viruses, but if you download it, you might want to
scan it again.
 
J

Jerry W. Lewis

By default, the displayed equation is heavily rounded. Right click on
the equation and format as Scientific notation with 14 decimal places.

Alternately, David Braden has posted VBA code to extract the
coefficients directly from the chart into cells

http://groups.google.com/[email protected]

Also
=LINEST(y_data,x_data^{1,2,3,4})
fits a 4th order polynomial.

The advantage of using the chart coefficients (either manually or via
Braden's code) is that LINEST (prior to Excel 2003) uses a numerically
poor algorithm that can give inaccurate results with some data sets.
The chart trendline (extracted by Braden's code) is much better numerically.

Jerry
 
I

Iskandar Taib

Jerry W. Lewis said:
By default, the displayed equation is heavily rounded. Right click on
the equation and format as Scientific notation with 14 decimal places.

Alternately, David Braden has posted VBA code to extract the
coefficients directly from the chart into cells

http://groups.google.com/[email protected]

Also
=LINEST(y_data,x_data^{1,2,3,4})
fits a 4th order polynomial.

The advantage of using the chart coefficients (either manually or via
Braden's code) is that LINEST (prior to Excel 2003) uses a numerically
poor algorithm that can give inaccurate results with some data sets.
The chart trendline (extracted by Braden's code) is much better numerically.

Jerry

ARGHH!!!! I finally figured out what I've been doing wrong - and I've
been beating my head against the wall with this for about six months
now. It was pretty simple - I was substituting the "y" value into the
equation instead of the "x" value.

Pretty embarassing!!! Grrr..

Incidentally, I found David Braden's VBA code. Is there a set of
step-by-step instructions on how to use it?

Thanks!
 
I

Iskandar Taib

Jerry W. Lewis said:
By default, the displayed equation is heavily rounded. Right click on
the equation and format as Scientific notation with 14 decimal places.

Alternately, David Braden has posted VBA code to extract the
coefficients directly from the chart into cells

http://groups.google.com/[email protected]

Also
=LINEST(y_data,x_data^{1,2,3,4})
fits a 4th order polynomial.

The advantage of using the chart coefficients (either manually or via
Braden's code) is that LINEST (prior to Excel 2003) uses a numerically
poor algorithm that can give inaccurate results with some data sets.
The chart trendline (extracted by Braden's code) is much better numerically.

Jerry

Hello

The earlier spreadsheet did indeed have a problem (I was referencing
"y" values instead of "x" when I plugged in the equation), but I've
still got problems. Have a look again at:

http://geology.um.edu.my/iskandar/Misc/regression-problem.xls

I have data in column E, which I plotted against A in Chart3. The
regression line gives the equation shown. As per your instructions,
the equation shows
17 decimal places. When the equation is plugged back in, I get the
data points that plot as red on Chart3.

Thanks,

Iskandar
 
J

Jerry W. Lewis

Thank you for not posting an attachment to the newsgroups. Since there
were no macros in you file, I did open it.

Excel uses IEEE standard double precision (~15 decimal digit accuracy)
which is not sufficient to do the calculations for this problem. You
can easily see this from your graph, which is not smooth. A quintic
equation has at most 4 local maxima and minima, but your graph shows
hundreds of local maxima and minima for you calculated quintic,
indicating that the calculations are very noisy.

The trendline formula subtracts a constant of roughly -3E15 to get a
result that is between zero and -300. Such a result could not be
accurate in the units place even if the coefficients had full 15-digit
accuracy. In fact, the trendline coefficients for your problem have
roughly 9 digit accuracy, so all precision is necessarily lost in the
subtractions. Personally I am amazed that Excel is able to get any
accuracy for this problem, since the condition number for X'X is about
4E44. You would not do as well with SAS or S-Plus (though they would
warn you that the problem is numerically singular).

There is little difference over your data range between a quartic (4th
degree) and a quintic (5th degree) polynomial. The quartic presses the
limits of IEEE double precision, but but might allow you to calculate
meaningful predictions.

You might also have better success fitting a polynomial in
x' = x-AVERAGE(x)

If you really believe your data collection process to be capable of
producing enough precision in your x and y values to distinguish between
quartic and quintic models, then for this problem you should use
software that would support more than IEEE double precision, such as
Maple or Mathematica.

Jerry
 
I

Iskandar Taib

Jerry W. Lewis said:
Thank you for not posting an attachment to the newsgroups. Since there
were no macros in you file, I did open it.

OK, thanks!
Excel uses IEEE standard double precision (~15 decimal digit accuracy)
which is not sufficient to do the calculations for this problem. You
can easily see this from your graph, which is not smooth. A quintic
equation has at most 4 local maxima and minima, but your graph shows
hundreds of local maxima and minima for you calculated quintic,
indicating that the calculations are very noisy.

The trendline formula subtracts a constant of roughly -3E15 to get a
result that is between zero and -300. Such a result could not be
accurate in the units place even if the coefficients had full 15-digit
accuracy. In fact, the trendline coefficients for your problem have
roughly 9 digit accuracy, so all precision is necessarily lost in the
subtractions. Personally I am amazed that Excel is able to get any
accuracy for this problem, since the condition number for X'X is about
4E44. You would not do as well with SAS or S-Plus (though they would
warn you that the problem is numerically singular).

There is little difference over your data range between a quartic (4th
degree) and a quintic (5th degree) polynomial. The quartic presses the
limits of IEEE double precision, but but might allow you to calculate
meaningful predictions.

You might also have better success fitting a polynomial in
x' = x-AVERAGE(x)

If you really believe your data collection process to be capable of
producing enough precision in your x and y values to distinguish between
quartic and quintic models, then for this problem you should use
software that would support more than IEEE double precision, such as
Maple or Mathematica.

Jerry

Yeah, the data is wind velocity, and has already been through a
smoothing filter. We're trying to "detrend" it so I can take the
difference between the real values and the regression values.

I can try S-Plus next (we've got a site license for it). I don't
understand how Excel can come up with the trendline displayed on the
graph, however, without being able to display the equation it used to
generate that trendline. Or is the equation displayed calculated
differently from the trendline itself? I might actually have to do
this the old-fashioned way - print it out and get the coordinates of
the plotted trendline graphically (then perhaps do a regression on
THOSE points to get the equation?).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top