How does LINEST work?

G

Guest

I am trying to understand the explanation for how the LINEST function
regresses that is given here: http://support.microsoft.com/kb/828533

On the QR Decomposition example worksheet there are a couple of steps that I
don't understand how to go from A to B that I would like to know if there is
any further explanation anywhere that I cannot find.

First is the section where the X premultiplied by P matrix is re-written
with the comment Rewrite: effectively 0 --> 0: What is happening here?

Finally at the conclusion of the example the regression coefficients are
backcalculated from the matrix as 1.25, 0 and 0. Yet the results of the
LINEST for this example give different coefficients (-3.111, 0 and 1.222).
Why are those numbers different.

Full disclosure - I am not a mathmetician by any stretch, just a ChemE who
skipped most of my math classes at Uni, so explanations that don't involve me
looking up big long words in big heavy books would especially be appreciated.
 
G

Guest

Given your disclaimer, do you really want to chase through the details or do
you just want a general concept of what is being returned?

For a general concept, LINEST returns a least squares solution, i.e. the
parameter estimates lead to estimated y values that minimize =SUMSQ(y-y_est).


If there is no collinearity (if none of the predictors are redundant), then
there is only one least squares solution. LINEST in all versions of Excel
attempt to calculate that unique solution. The only difference between
pre-2003 (normal equations) and post-2003 (QR decomposition) LINEST is how
efficiently the calculations use the available precision (if Excel had
infinite precision, both approaches would return exactly the same estimates.

Other than the numerically better algorithm, the only difference between
pre-2003 and post-2003 LINEST (when there is no collinearity) is the R^2, F
and SSreg (rows 3 to 5 of 1st column of LINEST output. Prior to 2003, the
LINEST calculations were based on an error in concept.

When there are redundant predictors (collinearity), pre-2003 LINEST would
return an error value, whereas post-2003 returns a least squares solution (a
particular one of infinitely many possible least squares solutions). While
the estimates are not unique in this case, the summary statistics in rows 3
through 5 of the LINEST output are still unique (i.e. any of the possible
least squares solutions would produce identical summary statistics)

For details, you might find either of the following useful
http://en.wikipedia.org/wiki/QR_decomposition
http://www.physics.louisville.edu/help/nr/bookfpdf/f2-10.pdf

Jerry
 
G

Guest

There are two problems in your use of the knowledge base [KB] example, one is
yours (exacerbated by Microsoft [MS]), the other is Microsoft's alone.

You must array enter (Ctrl-Shift-Enter) every block of formulas involving
MMULT(). MS mentioned entering array formulas early in the article, but I
see nothing in the write-up about all the array formulas in this particular
example.
- Select the block I29:L32, where every cell contains the formula
=MMULT(E29:E32,TRANSPOSE(E29:E32)), and simultaneously press
Ctrl-Shift-Enter. If you do it right, I29:L32 will become a symmetric array
instead of a column vector repeated four times. In the formula bar, you will
see {=MMULT(E29:E32,TRANSPOSE(E29:E32))} even though you did not type the
curly brackets; in general, this is how your recognize that Excel understands
this as an array formula.
- Similarly, array enter the formulas in the blocks G35:I38, L35:L38,
I52:K54, H58:I60, and L58:L60

At this point, the "Rewrite: effectively 0 --> 0:" comment should be
obvious; you are replacing numbers that are effectively zero (<1E-15) with
explicit zeroes.

The other problem with the example is that when I copy/paste the whole thing
into a worksheet, I get two extraneous blank lines after row 68; this means
that several of the subsequent formulas are pointing to the wrong cells.
Copy from "QR Decomposition main loop terminates because longest remaining
sub-vector has length 0" to the end of the example and paste into the
worksheet beginning at cell A69. Cells G71:H71 should now contain the first
two LINEST 2003 coefficients (1.222222 and 0). The third coefficient
(-3.11111) is calculated separately in C80.

Also potentially confusing are the references to bold cells (presumably
A47:A49 and D47:E49) that are not formatted bold in the KB example.

Jerry
 
Top