Re: Regression of correlated variables




Paige Miller wrote:
On 11/30/2006 12:03 PM, Greg Heath wrote:
Tim vor der Brück wrote:

I would like to do a regression of highly correlated variables. Could
somebody give me a hint about the best method. Principle

Terminology: Principal

Component Analysis?

I will assume you mean Multiple Linear Regression (y = X*b).

There is no general "best method". It depends what you want.

3. Partial-Least-Squares provides a sequence of approximate
solutions based on sequences of orthogonal linear combinations
of the predictors constrained to maximize the covariance
measure ||X'*y||^2. Free MATLAB PLS Toolboxes are available
from non-MATLAB sources. (Search on PLS Paige Miller).

Analysis of the above results can, in some cases, lead to
a practical ad hoc method of eliminating redundant
and/or irrelevant variables.

PLS does not in general lead to eliminating redundant variables,
although I suppose you can do that if you really really want to. (I
can't tell if Greg Heath's writing referred to PLS as leading to an ad
hoc method of eliminating redundant variables, or if he meant that the
totality of all methods he lists lead to methods that can eliminate
variables).

I meant all three.

I would get the idea of eliminating redundant variables
out of your head (unless of course variables are truly linearly
dependent). In real data, noise among the predictor variables usually
means that the one that SEEMS to correlate best with your response may
in fact correlate best because of noise, and if you pick this one for
your model, you may be eliminating the true causal variable. When you
have highly correlated predictors, you are not going to be able to
empirically pick the right variable, the truly causal variable, so why
even try? What's the point?

Very often the sample size isn't large enough to yield accurate
weight estimates when the number of variables is large. I see
PLS as a more logical choice for variable reduction than PCR.

Get that idea of eliminating variables out of your head in this situation.

I can't agree if the sample size is small w.r.t. the number of
variables.

In this case of course, putting too much emphasis on the
importance of the individual chosen variables is ill-advised.

The advantages of PLS are: it does not assume that you can tell from
the data which variables to keep and which variables to delete, so you
generally keep all variables (at least in many applications); it is
specifically designed for cases where the predictor variables are
correlated, and in those cases Frank and Friedman (Technometrics,
1993) show that the PLS predictions have lower MSE than OLS regression
predictions, and that PLS regression coefficients have lower MSE than
OLS regression coefficients; and finally an advantage that no one
seems to write about, but one that I find extremely helpful -- PLS
lends itself to graphical display of results better than any other
method of multiple regression that I have seen.

Thanks, Paige.

Hope this helps.

Greg

.



Relevant Pages

  • Re: Questions about square errors
    ... Take a look at the 10X10 correlation coefficient matrix and the ... multicollinearities. ... least squares and/or multiple regression. ... Your model may have several unnecessary predictors. ...
    (sci.stat.math)
  • Re: best predictors
    ... Regression data is, typically, noisy. ... Predictors are, typically, correlated. ... Use a priori knowledge and the all-variable correlation ... Check the adequacy of linear modelling by ...
    (sci.stat.edu)
  • Re: stepwise regression by GENSTAT
    ... My handbook considers only stepwise regression as a method to select ... leaving behind only "random variation" in the residuals (residuals = ... to which subset of these to use as predictors. ...
    (sci.stat.math)
  • Re: Enter versus forward method for linear regression
    ... Regression, ... present the coefficents and p values of all predictors so that readers ... try Robert Abelson's book "Statistics as Principled Argument." ... and examine the effects on the coefficients. ...
    (sci.stat.edu)
  • Re: interaction terms in regression model
    ... approach to regression than that in some other disciplines. ... centering constants you use (I'm assuming a two-variable model here... ... two linear and one interaction between those) the "significance" of the ... The "failure to center" ...
    (sci.stat.edu)