Regression with zero intercept

T

TheMoth

When I regress 86 pairs of lab duplicates, I get a better fit by hard-coding
0 intercept than letting XL calculate it. This seems wrong: the least squares
regression should be the best fit of the data. I compared LINEST and the
Regression tool in the Data Analysis Tool Pak and they yield the same answer.
I suspected that XL adds a (0,0) to the data set because the total df in the
ANOVA output is one larger for the fixed intercept, but testing that with RSQ
yielded a different value. The stats (below) all favor the fixed intercept;
even the calculated slope is closer to 1 and the confidence limits are
tighter:

Stat b=calc b=0
Intercept -0.02048 0
Rsq 0.9929 0.9991
std Err 0.2318 0.2306
Slope 1.0045 1.0020
std Err 0.0093 0.0033
L 95.0% 0.9861 0.9955
U 95.0% 1.0230 1.0085

Thanks in advance for any input.

S

Stan Brown

Fri, 18 Jan 2008 13:40:00 -0800 from TheMoth
When I regress 86 pairs of lab duplicates, I get a better fit by hard-coding
0 intercept than letting XL calculate it. This seems wrong: the least squares
regression should be the best fit of the data. I compared LINEST and the
Regression tool in the Data Analysis Tool Pak and they yield the same answer.
I suspected that XL adds a (0,0) to the data set because the total df in the
ANOVA output is one larger for the fixed intercept, but testing that with RSQ
yielded a different value. The stats (below) all favor the fixed intercept;
even the calculated slope is closer to 1 and the confidence limits are
tighter:

Stat b=calc b=0
Intercept -0.02048 0
Rsq 0.9929 0.9991
std Err 0.2318 0.2306
Slope 1.0045 1.0020
std Err 0.0093 0.0033
L 95.0% 0.9861 0.9955
U 95.0% 1.0230 1.0085

These are not large differences, but like you I am surprised by their
direction.

What do the residuals look like under either scenario? It's fairly
easy (in Excel) to calculate the sum of squares of residuals for each
of the two lines. I suspect (I hope) that the figure will be a bit
larger for the b=0 line.

J

Jerry W. Lewis

Rsq values with and without an intercept are not directly comparable. With
an intercept, Rsq indicates the improvment of regression over simply
representing each value by the mean of all the values. Without an intercept,
Rsq indicates the improvement of regression over representing each value by
zero. Unless the mean of the data values happens to be zero, Rsq without an
intercept will be larger than Rsq with an intercept, but that does not mean
that the fit is better.

Jerry