Re: Regression significance conundrum
- From: G Robin Edwards <robin.edwards@xxxxxxxxxxxxx>
- Date: Thu, 20 Oct 2005 22:26:47 +0100
In article <1129805309.e1ca1626fe3c3faf559ac1478032dab9@teranews>,
Andy Spragg <andy.spragg@xxxxxxx> wrote:
> I'm revisiting some old bivariate data (66 values) with a fresh pair
> of statistical spectacles (last time I didn't /have/ any statistical
> spectacles). My eye told me then. and still tells me now, that there
> is a good linear correlation between the two variables. Last time I
> just fitted a straight line, got an R-squared of 0.79, and was quite
> happy.
> Now I know better. This time, I fitted a straight line and discovered
> that the constant and gradient are both statistically highly
> significant (p values both 0 to 3dp). I checked the residuals. They
> are beautifully normally distributed. Only two (when standardized) are
> unusual in 95% confidence terms (and with 66 data points, I expect two
> or three unusual residuals anyway). No pattern when they're plotted
> against the order of the data, or against the fitted value. So my
> original correlation was far more legitimate than I realised.
> Then I managed to rain on my own parade. I decided that actually,
> there might be a slight curvature in the data, and I might do better
> if I fitted a quadratic. So I tried it. I expected the constant and
> the gradient to remain highly significant, and that the stats would
> tell me whether or not the additional term was also statistically
> significant.
> What I actually found is that in the quadratic fit, /none/ of the
> three coefficients are significant at the 95% level (p values 0.084,
> 0.161 and 0.404 respectively, for constant, linear and quadratic terms
> respectively)! However, the R-squared is the same as for the linear
> regression, and all the observations about the residuals remain valid.
> The only difference is three unusual residuals rather than two, and
> the observation at each end of the data set is flagged as having large
> influence.
> So what's going on here? If I had started with the quadratic
> regression, I would apparently have concluded with 95% confidence that
> my data set was random noise about a mean value of 0. How come the
> stats don't show that a linear regression is highly significant and
> that a quadratic fit does not confer significant additional benefit?
My first attempt would be to standardise your original data before
generating the quadratic term. Then do your polynomial (or polynomials
if you decide to try a cubic) regression. This does not affect the
outcome but often makes clearer (to the investigator's eye) the relative
effects of the terms. You will find r-squared etc to be identical to
the ones you get with unstandardised data, which should be reassuring.
Collinearity between the linear and quadratic variables will have
vanished, and the "strange" values you've seen for the apparent
probability of the linear term will have resolved into something
"sensible".
The "regression plots" and their confidence intervals will be identical,
but will have different scales for the X axis.
This should reassure you completely about the regression on standardised
X data.
If you go a bit further and compute the variance inflation factors for
the quadratic models you'll see huge values for the untransformed
variables and values close to 1 for the standardised regression.
Computing "influential points", if there are any, will provide identical
values for their statistics, again indicating that the standardised
regression is absolutely valid.
That's what I'd do, anyway, though I'm not a statistician.
Robin
.
- References:
- Regression significance conundrum
- From: Andy Spragg
- Regression significance conundrum
- Prev by Date: Re: Regression significance conundrum
- Next by Date: Exponential PDF: critical values of the sample means
- Previous by thread: Re: Regression significance conundrum
- Next by thread: Re: Regression significance conundrum
- Index(es):
Relevant Pages
|
Loading