Re: When and how to do transformation in multiple regression

From: Herman Rubin (hrubin_at_odds.stat.purdue.edu)
Date: 11/29/04


Date: 29 Nov 2004 10:46:47 -0500

In article <41aa2feb_1@127.0.0.1>,
beliavsky@aol.com <beliavsky@127.0.0.1:7501> wrote:

>jo_chau@hotmail.com (Jo) wrote:
>>I am doing a project of multiple regression. I've found a best subset
>>for the model. I think it is finished. However, I am told that I
>>should do certain transformation about the data because they are
>>nonlinear. I am confused. I don't know when constructing a multiple
>>regression, how to examine whether the data are linear or not, when I
>>should do the transformation and which technique I should choose.

Point one to consider; if you have a type of model, and
transform the data, your model also gets transformed.

If the model is supposed to be linear in a certain set
of coefficients, after transformation this will not be
the case.

>Some possible signs that the response variable should be transformed are
>that
>(1) the distribution of residuals is right-skewed

No; symmetry of the distribution of the residuals does
mess up testing based on normality, but not the validity
of the model or the estimation procedure. The Gauss-
Markov Theorem establishes least squares as optimal among
a certain class of procedures, and this will extend to
Bayes procedures or other such modifications, such as
ridge regression, which is really empirical Bayes.

>(2) the response is always positive

Possibly and possibly not. Certainly, some modification
is likely to be needed, but nor always.

>(3) the variance of the residuals is positively correlated with the size
>of the predicted response variable

Again, nor necessarily.

>These properties would suggest considering a log transformation of the response.

Both together might, if 0 is not a possible value.

>Transformations of the predictor variables may be indicated if they substantially
>improve the R^2 of the regression and they make sense, indicating that the
>relationship between the response and predictors is nonlinear. For example,
>in predicting longevity from income, I would expect that the going from $20,000
>to $100,000 in annual income has a bigger effect in longevity than going
>from $100,000 to $180,000, so a log or power transformation of income (with
>the power between 0 and 1) would be plausible.

Transformation of the predictor variables is somewhat
safer, but the assumptions need to come from the user,
not for statistical reasons.

>A good book on the subject is "Transformation and Weighting in Regression",
>by Raymond J. Carroll and David Ruppert.

-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Department of Statistics, Purdue University
hrubin@stat.purdue.edu         Phone: (765)494-6054   FAX: (765)494-0558


Relevant Pages

  • Re: Importance of R-squared in Multiple Regression
    ... following questions on R-square in Multiple Regression: ... values of the predictors. ...
    (sci.stat.math)
  • Re: CLT and regression
    ... understand its relationship with regression analyis. ... using a Box-Cox transformation to get them normal: ... transformations often gets you close enough to normality. ... and the estimated weights will ...
    (sci.stat.consult)
  • Re: stepwise regression by GENSTAT
    ... My handbook considers only stepwise regression as a method to select ... leaving behind only "random variation" in the residuals (residuals = ... to which subset of these to use as predictors. ...
    (sci.stat.math)
  • Re: Questions about square errors
    ... Take a look at the 10X10 correlation coefficient matrix and the ... multicollinearities. ... least squares and/or multiple regression. ... Your model may have several unnecessary predictors. ...
    (sci.stat.math)
  • Re: Enter versus forward method for linear regression
    ... Regression, ... present the coefficents and p values of all predictors so that readers ... try Robert Abelson's book "Statistics as Principled Argument." ... and examine the effects on the coefficients. ...
    (sci.stat.edu)

Quantcast