Regression results

G

Guest

I need to run regression analyses (linear & nonlinear) on many data sets. I
have observed that the resulting equations generated in a chart trend line
and that generated by the linest() function are frequently very different. I
have read Tushar Mehta's explanation of the problem, but need advice on which
is more accurate.
 
G

Guest

From what I have read:
the trendline is supposed to be very good, but you need to format it to show
a significant number of digits to capture that accuracy.

http://tinyurl.com/k2at5

Jerry Lewis appears to be a knowledgable person on this topic (Phd in
Statistics)


Note that improvements were made in xl2003

http://support.microsoft.com/?kbid=828888
Excel Statistical Functions: Overview of Improvements in Statistical
Functions for Excel 2003

http://support.microsoft.com/default.aspx?scid=fh;EN-US;xl2003StatFunc&style=flat2&product=xl2003
 
G

Guest

Thank you for your help (again)! I am using Excel 2003 and still having the
problem. I tried to find a way to contact Jerry W. Lewis without success.
Do you have any idea what his email address might be?
 
G

Guest

You have piqued my curiosity with your statement that the chart trendline and
LINEST are "frequently very different." Can you give a simple example?

The chart trendline uses a better algorithm, assuming that you are using an
"XY (Scatter)" chart instead of a "Line" chart. The "Line" chart presumes
that the x-values are 1,2,3,... regardless of any (possibly numeric) x
category labels that you may have supplied. Consequently trendlines from a
"Line" chart are often worse than useless.

Jerry
 
G

Guest

I believe that you have solved the issue. My X values are month numbers and
do not always begin with 1. I am using line charts. The process I have been
using is to take about 3 years of monthly data, break the series into 3
smaller series (early, mature and long term), curve fitting each of the 3
series, then joining the resulting curves to product a 3 year curve which I
plot over the original series. The 'long-term' curve is sometimes
extrapolated for an additional 2 years. The chart's trend line coefficients
frequently are different that those returned by the equivalent linest()
function. The differences in coefficients result in slightly different
curves -- just enough to inspire bank executives to ask questions. I believe
that your response about line charts may be the real issue. I'll change the
approach and use scatter charts.

Thank you for your help!
 
G

Guest

You're welcome. Glad it helped.

Jerry

Mitch said:
I believe that you have solved the issue. My X values are month numbers and
do not always begin with 1. I am using line charts. The process I have been
using is to take about 3 years of monthly data, break the series into 3
smaller series (early, mature and long term), curve fitting each of the 3
series, then joining the resulting curves to product a 3 year curve which I
plot over the original series. The 'long-term' curve is sometimes
extrapolated for an additional 2 years. The chart's trend line coefficients
frequently are different that those returned by the equivalent linest()
function. The differences in coefficients result in slightly different
curves -- just enough to inspire bank executives to ask questions. I believe
that your response about line charts may be the real issue. I'll change the
approach and use scatter charts.

Thank you for your help!
 
G

Guest

I just started reading your thread and have become interested. I was at one
time trying to create data points from an anaysis of Stock closiings, that
are represented by a regression (6th degree) line. I was not able to use the
formula supplied in previous versions of Excel, prior to Ver 2003. But I
would be interested in trying this again, since i have the new version. I
would like to actually create data points as respesented by the regression
line, fi that is possible?

Thanks,
 
G

Guest

If by "6th degree", you mean a 6th degree polynomial, then you should be
conserned about whether you have sampled a wide enough range of data to be
able to reliably estimate the parameters. Even with independent parameters
and/or adequate data range, validation of the formula could be interesting.
Remember the famous quote of von Neumann "With four parameters I can fit an
elephant, and with five I can make him wiggle his trunk."

Jerry
 
G

Guest

Hi Jerry,

I usually have somewhere around 375 data points, which goes up by 1 each
week, as I am tracking a stock closing, but only on a weekly basis. Yes, it
is the 6th degree polynomial that I add to the graph. I tried to figure out
the individual data points represented by the graph, using the formula
provided by the graph, but the "point" was not very accurate and I am given
the understanding that this was due to rounding errors and the precion of
Excel, prior to Office 2003.

Maybe this has all changed? I had also run across a third party add-on that
was claiming to have increased the accuracy, but did not purchase it and have
not heard anyone else mention it or vouch for it. But if I could find a way
to figure out the individual points represented by the graph, I would be
interested in doing that.

Thanks for your help.
 
G

Guest

How widely spaced are these 375 x-values?

If the x-values are 1,2,3,... then fitting a 6th degree polynomial is a very
difficult problem numerically (condition number ~3E31). The direct algorithm
used by LINEST prior to 2003 would likely produce meaningless results. The
chart trendline and LINEST might be accurate to a few figures, but the
problem could be challenging for them as well. To reliably fit a problem
this numerically difficult, you might need a package that uses quadruple
precision (I don't know of any statistics programs that do) or arbitrary
precision (cf.
http://groups.google.com/group/microsoft.public.excel.programming/msg/d2fdea49d5c999a7
). You could probably do this in double precision by fitting orthogonal
polynomials, assuming that the x-values themselves are really accurate enough
to be worth the effort.

Your mention of "the formula provided by the graph" raises another issue:
while the chart trendline (unchanged with 2003) has always been quite good
numerically (better than almost all dedicated statistical packages, except
where they fit by orthogonal polynomials), by default very few figures of
this high quality fit get displayed on the graph. You need to right click on
the equation and change the numeric format to display scientific notation
with 14 decimal places.

Jerry
 
G

Guest

Hi Again Jerry,
What I am recording are weekly closes of stocks and the precision displayed
is to the 100ths. And it is graphed based on the time line of a week. I do
not know what orthogonal
polynomials are, so I would not know how to implement that, but the 6th
degree polynomial that is displayed appears very good to me. My understanding
is that it is based on a least squares and that the numbers are so large that
Excel has a difficult time calculating out as far as the equation requires?
Thansk again for your help.
 
G

Guest

Presumably the stock closes are what you want to predict. That is largely
irrelevant for determining the numeric difficulty of the problem. The issue
is what you are trying to use to predict them (a 6th degree polynomial in
what?), and whether you have sampled a wide enough range to have any hope of
accurately estimating those coefficients.

As I also tried to communicate, the numerical problems are not an Excel
issue, per se, since the accuracy of the Excel chart trendline and Excel
2003's LINEST function is comparable to that of dedicated statistics
programs. Accurately fitting 6th degree polynomials is often very difficult
numerically.

Potentially even more problematic is the question of whether accurately
fitting this 6th degree polynomial would tell you anything useful. Unless
you have some theoretical basis for expecting that the form of your model (a
6th degree polynomial in whatever) is right, then accurately fitting the past
gives no assurance that the model will accurately predict the future, even in
the near term. The more empirical parameters you have to include to fit the
past, greater the chance that the fitted model will have little or no
predictive power (recall the von Neumann quote).

Jerry
 
G

Guest

Hi Jerry,

I am not trying to predict, I am trying to find the specific data points
associated with a 6th degree polynomial. I have the two things only a Date
and a Close. These are compiled on a weekly basis for a stock each Monday.
Today I will add a close and a date to a stock after the market closes.
Nothing more or less.

I sometimes chart that stock, very simple, the dates and the closings. I can
add to that chart a 6th degree poly. But I do not know what the data points
are that are plotted and that is what I would like to know. I usually have
about 5 years worth of data, but I commonly only plot 2 years worth of data.
The data I use is actual closes for the stock on each Monday. After adding
the 6th degree poly, I would like to know where the line is on the chart for
any specific Monday and would like to add a column to the simple File that
says "Poly Close."

Date Close
1/1/1999 10.00
.... ...
.... ...
4/24/06 15.00
(Above is what I have)

(What I would like is)
Date Close PloyClose
1/1/1999 10.00 9.88
.... ... ...
.... ... ....
.... ... ...
4/24/06 15.00 14.65
 
G

Guest

Mitch said:
I need to run regression analyses (linear & nonlinear) on many data sets. I
have observed that the resulting equations generated in a chart trend line
and that generated by the linest() function are frequently very different. I
have read Tushar Mehta's explanation of the problem, but need advice on which
is more accurate.
 
G

Guest

I also would be interested in this. I have heard of a program that
significantly increases Excels capabilities beyond the 15 or 16 significant
digits. Has anyone used these and had positive results? It was some time ago
when I first heard of them, about the time this post originated.

Thanks
 
G

Guest

I have been studying the market with different statistical functions for many
years ; binomial law, standard deviation, normal law, etc. I've also
downloaded many programs trying to understand the behavior of the market and
it's not easy, just predicting the direction of the market is a task even if
at the end the risk/reward could generate lot's of profits or losses; how
many people brokers, individuals had predicted the black Monday in 1987 ?
Anyways why a 6th degree, why not 5th or 7th degree ? I believe you would
extract a minimum or a maximum like in a second degree equation (-b/2a for x)
if negative maximum (downslide) and if positive (uptrend) so this would give
i believe a range over time ????? i do hope i'm making sense here......!!!!
this is order normal conditions but what if in 5 days the barrel of oil would
go up or down by $3.00, what if the $us would trade at 115 yens, if interest
rates would increase/decrease of 1%, if Microsoft or whatever stock on the
Dow would increase/decrease his earnings of $1 how would the market react ????

So far one of the best thing i came over was a multiple regression (LINEST
function) not only you can play with the amount of periods, variables but
also you can play with the standard deviation of the equation and better yet,
you can see what impact each variable does in your equation. Believe me
currencies have a lot of impact for example the price of oil is in $US
however when converting into foreign currencies that price could be
attractive/expensive, same thing for gold, interest rates, options on
indexes, etc.

I remember in college doing a paper on a simple regression unemployment rate
and gallons of beer sold, it was at 0.9 over a long period period of term
(more then 30 months) it was logical that people since they dont work stayed
home and drank beer however people had lower income !!!! now if you would
join a third variable the correlation would increase or decrease a lot,
however if a fourth variable would be had up the correlation would stabilise
around 0.6. Even if you would change the 3rd or the 4th one, it would stay
around 0.6.

it seems also that you have a hard time with small numbers probably because
when using the date, Excel uses a number instead; 39000 represents Oct
10/2006, use Oct 10/1906 and it will recalculate it at 2475, so small numbers
will come lot less and reformat it just to see the date without the year.
Also dont forget to use the format scientific because if you have a variable
like 1.5666 E-25 it takes a mighty big number just to came to 0,01 (power
22-23-24).

Since June 2005, by using multiple regressions i have a compounded annual
rate of return of 16.05% wich would place me in Canada among in the first
best quartile of managers even if my worksheet uses 60 megs and i believe if
i fine tuned variables, periods, ANOVA, decision trees, study sectors, have
the time also to compare forecasts and results i would out perform the
general trend of the market.

On the uptrend it's easy to buy stocks and out perform the index but it
comes harder and harder to say; time to take to my profits and run away and
stay away, on the side line until a stock reaches a buttom and buy it back
when it reaches the buttom unless you're into put options.

last week i have sold some units of a Japaneese fund that tracks the Nikkei
index that i had bought in October 2006 at $5.5 and sold them at $6.80
because the probability of reaching well over $7 was decreasing but on the
other hand the probability of increases in the price of gold, oil, bonds,
high-tech, small caps was slowly increasing even if the market was high so
with the income i have reinvested 50% into a fund that tracks the canadian
index and kept 50% on the sideline in case if.........
 
G

Guest

Hi Again,

Yes many variables that can be looked at, but my analysis is fairly simple.
It is an experiment and work in progress. I still remain interested in find
the "data" points. I bottom fish now, but it appears there might be a better
way, especially in an up trend. Some of the downs are easy to spot, this type
of market is much harder I think.
 
G

Guest

A couple of Excel add-ins that support user specified numeric precision are
http://digilander.libero.it/foxes/index.htm
http://precisioncalc.com/

You can also get infinite or user specified precision from symbolic
manipulators
such as Maple, Mathematica, MacSyma and open source packages like Maxima
http://maxima.sourceforge.net/

As I mentioned last May, Excel's precision is no better or worse than that
of dedicated statistics packages. Precision issues between Excel and other
packages are more a function of the algorithm employed, than the inherent
precision.

Jerry
 
S

Statistical interest

Folks ... a quick question - can you do a multi-variate non-linear regression
in Excel? ...the trendline clearly does that with one variable. But what if
you have multiple variables?
 
D

David

Thank you Jerry

David

Jerry W. Lewis said:
A couple of Excel add-ins that support user specified numeric precision are
http://digilander.libero.it/foxes/index.htm
http://precisioncalc.com/

You can also get infinite or user specified precision from symbolic
manipulators
such as Maple, Mathematica, MacSyma and open source packages like Maxima
http://maxima.sourceforge.net/

As I mentioned last May, Excel's precision is no better or worse than that
of dedicated statistics packages. Precision issues between Excel and other
packages are more a function of the algorithm employed, than the inherent
precision.

Jerry
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top