# Confidence / prediction intervals in multiple linear regression

J

#### Jon

I am trying to find some vba code / method of calculating the
confidence intervals for the mean of the dependent y and the
prediction interval for the dependent y in multiple linear regression
models. I want to calculate these in excel rather than a stats
package. Any code, ideas or suggestions for any online groups that
are more stats based would be welcome. Thanks.

G

#### Guest

Excel does not natively provide all the information necessary in a convenient
form. Therefore how much YOU understand of statistics is more to the point.
Absent information about the specific multiple regression model you want to
fit, I will have to assume that you understand basic linear algebra as well.
Sorry in advance if this is too heavy to slog through.

You will need to calculate the variance of an estimated point on the
multiple regression. Suppose X is the "design matrix" (the array that you
would pass to LINEST, augmented with a column of ones [unless you are not
fitting a constant term]). If b is the corresponding vector of coefficient
estimates (a column, and in reverse order to the LINEST output), then
=MMULT(X,b) gives the estimated multiple regression at your data points, i.e.
the same output as =TREND(known_y's,known_x's,,const). The predicted value
at a given point on the multiple regression would be =MMULT(v,b) where v is
the row of X corresponding to the point (if it is in the data set), or is
constructed similarly (if it is not in the data set). You will need D which
is calculated as
=MMULT(MMULT(v,MINVERSE(TRANSPOSE(X),X),TRANSPOSE(v))
The variance of an estimated point on the multiple regression is then
=D*MSE, where MSE is =sey^2 and sey is one of the quantities output by
=LINEST(known_y's,known_x's,const,TRUE). Similarly, the variance of a
predicted future point that follows the same multiple regression is
=(1+D)*MSE.

A 95% 2-sided confidence interval for a point on the multiple regression
line is then
=MMULT(v,b) +/- SQRT(D*MSE)*TINV(0.05,df)
A 95% 2-sided prediction interval for a new point that follows the multiple
regression is
=MMULT(v,b) +/- SQRT((1+D)*MSE)*TINV(0.05,df)

Some simplification may be possible given knowledge of the particular
regression model that you want. For instance, with simple linear regression:
MSE reduces to STEYX(known_y's,known_x's)
=MMULT(v,b) reduces to
=FORECAST(x,known_y's,known_x's)
or equivalently to
=INTERCEPT(known_y's,known_x's) +x*SLOPE(known_y's,known_x's)
and D reduces to
=1/COUNT(known_x's) +(x-AVERAGE(known_x's))^2/DEVSQ(known_x's)

Jerry