Re: Multicollinearity !!!!!



On Thu, 07 Aug 2008 11:04:19 -0400, Paul Rubin <rubin@xxxxxxx> wrote:


I interpret multicollinearity as saying that the predictors have some
information in common, and each has some unique information to
contribute. If you delete a predictor, you lose it's unique
information. So we agree there's a cost/risk to this. Assuming there's
no external information that will let you determine which predictors are
most expendable, you pretty much have to pick one (or more, depending on
how many relationships exist), drop it and take your chances. One thing
I suggest to my students is to identify which predictors are collinear,
then run the model with each dropped individually and look at the
residuals. I'm of the opinion that if model A has a slightly higher
adjusted R^2, slightly better AIC, whatever, but its residuals don't
pass for noise, and model B has worse R^2 etc. but more pattern-free
residuals, I'll go with B. But at the end of the process, I'll drop
some variable(s), because I can't do much useful inference with a model
whose VIFs are big.


Consider a simple case of regressing y on x and z and finding a great
deal of multicollinearity but good joint prediction. You can do very
good inference: You can infer that the data tells you that some
combination of x and z explain y, but that the data doesn't tell you
which of the two is responsible.

The risk of most approaches to "handling" multicollinearity is that
one variable gets dropped, and then the user believes he has evidence
that the other variable is responsible for the explanation.




I teach business students, so there is absolute no danger of their
rolling their own. I'm curious why you think LS is hard. If I were
doing my own code, I would have three concerns: speed on large
datasets; coping with accumulated rounding error on large datasets; and
dealing with stiffness. Stiffness can occur anywhere, but the first two
concerns I think are really restricted to rather large datasets. There
are lots of libraries for matrix operations that have inversion or
linear system solvers that (allegedly) are reliable when the matrix is
stiff. So I guess I would not hesitate to write my own code as long as
I were dealing with small to medium datasets -- but under no
circumstances would I write my own matrix inversion routine, nor would I
necessarily use a generic one from a source I didn't trust.

That said, as long as R is free, I see no reason to ever write my own.

/Paul

I agree, as long as a suitable inversion routine is used there's no
big problem.

-***
.