Re: Multicollinearity !!!!!
- From: Greg Heath <heath@xxxxxxxxxxxxxxxx>
- Date: Fri, 8 Aug 2008 21:05:21 -0700 (PDT)
On Aug 8, 1:19 pm, Bruce Weaver <bwea...@xxxxxxxxxxxx> wrote:
Peter wrote:
About using composite variables... or at least related to that... I haven't seen anyone comment on principal components analysis as a solution to multicollinearity problems, and I was wondering if this has a specific reason. I noticed that many of you are very much against "black-box" statistics, but on the other hand, PCA coordinates are often interpretable and even more meaningful for the problem than the original variables. Secondly, and not in all problems interpretability is an issue, for example, I suppose it is not if you are more into data mining and prediction than into statistical inference. Any opinions?
See the comments on PCA in this post.
http://groups.google.com/group/sci.stat.math/msg/5e5a9e4adf54ae5f?dmo...
Message-ID: <1145419614.825649.32950@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>
From that post:
On Apr 19 2006, 12:06 am, "Reef Fish" <Large_Nassau_Grou...@xxxxxxxxx>
wrote:
sangdon...@xxxxxxxxx wrote:
MLR model is (from matrix algebra)
Y = XB
The least square estimator of B is
B_hat = inv(X'X)X'Y
...
Multicollinearity is an ubiquitous problem affecting the
computation of the inverse matrix.
Practicioners of MATLAB do not use explicit matix inversion
to solve the linear regression equations. Typically, they either
use the syntax
B = X\Y
which automatically uses a QR decomposition solution that
yields a BASIC solution when X is singular... or they use
B = pinv(X)*Y
where pinv is a pseudo-inverse operator that yields the
B with minimum L2 norm.
The BASIC solution automatically yields n-r zero coefficients
if the rank of X is r < p. However, the choice of which
variables to include in the regression are based on numerical
stability via partial pivoting and does not necessarily
imply that those variables are the best for understanding
the underlying input/output relationship.
In contrast, the minimum norm solution tends to produce
solutions with no nonzero coefficients.
Another option is
B = stepwisefit(X,Y)
which, among other options, can be used for stagewise backward
elimination and stagewise forward selection.
PCR (Principle Components Regression) regresses Y on the PCs.
If Y is regressed on ALL of the PCs, then it gets exactly the
same result back as the original full regression.
So, PCR generally regresses Y on the PCs that have the largest
eigenvalues (the PCs with the largest variances).and drop the
PCs with the smallest eigenvalues.
That's exactly what's WRONG with PCR, because the discarded
PCs may account for most of the fit in Y while the kept PCs
still contain all of the original X's. That was one of the
points in the Hadi-Ling paper.
This can be mitigated by choosing PCs via QR or stepwisefit.
... ALL of the variables in a multiple regression (and their
coefficients) are inter-related (in the partial correlation
sense), and one cannot isolate a single variable and say that
is the "most important" because it (sic) is ill-defined in
any sense of that term.
Not true. Once a model is chosen, the variables can be ranked
by the change in a suitable measure like adjusted R^2 when
that variable is removed. However, if the model is changed by
removing the 1st or last ranked variable, the relative rankings
of the remaining variables will change.
long story short, I gave
up.....and use PCR and PLS)
I have used PCs with the selection determined by QR or
stepwise fit. PLS appears to be a better approach. However,
I have never taken the time to program it in MATLAB.
Commercial versions are available but I am to stingy to shell
out those kind of bucks.
Hope this helps.
Greg
.
- Follow-Ups:
- Re: Multicollinearity !!!!!
- From: sangdonlee
- Re: Multicollinearity !!!!!
- References:
- Re: Multicollinearity !!!!!
- From: RichUlrich
- Re: Multicollinearity !!!!!
- From: Peter
- Re: Multicollinearity !!!!!
- From: Bruce Weaver
- Re: Multicollinearity !!!!!
- Prev by Date: Re: maths help
- Next by Date: Re: Multicollinearity !!!!!
- Previous by thread: Re: Multicollinearity !!!!!
- Next by thread: Re: Multicollinearity !!!!!
- Index(es):
Relevant Pages
|