Re: When and how to do transformation in multiple regression
From: Herman Rubin (hrubin_at_odds.stat.purdue.edu)
Date: 11/29/04
- Next message: Shmuel (Seymour J.) Metz: "Re: Surprising Pattern of Florida's Election Results"
- Previous message: David Jones: "Re: multivariate normal distribution"
- Maybe in reply to: Jo: "When and how to do transformation in multiple regression"
- Messages sorted by: [ date ] [ thread ]
Date: 29 Nov 2004 10:46:47 -0500
In article <41aa2feb_1@127.0.0.1>,
beliavsky@aol.com <beliavsky@127.0.0.1:7501> wrote:
>jo_chau@hotmail.com (Jo) wrote:
>>I am doing a project of multiple regression. I've found a best subset
>>for the model. I think it is finished. However, I am told that I
>>should do certain transformation about the data because they are
>>nonlinear. I am confused. I don't know when constructing a multiple
>>regression, how to examine whether the data are linear or not, when I
>>should do the transformation and which technique I should choose.
Point one to consider; if you have a type of model, and
transform the data, your model also gets transformed.
If the model is supposed to be linear in a certain set
of coefficients, after transformation this will not be
the case.
>Some possible signs that the response variable should be transformed are
>that
>(1) the distribution of residuals is right-skewed
No; symmetry of the distribution of the residuals does
mess up testing based on normality, but not the validity
of the model or the estimation procedure. The Gauss-
Markov Theorem establishes least squares as optimal among
a certain class of procedures, and this will extend to
Bayes procedures or other such modifications, such as
ridge regression, which is really empirical Bayes.
>(2) the response is always positive
Possibly and possibly not. Certainly, some modification
is likely to be needed, but nor always.
>(3) the variance of the residuals is positively correlated with the size
>of the predicted response variable
Again, nor necessarily.
>These properties would suggest considering a log transformation of the response.
Both together might, if 0 is not a possible value.
>Transformations of the predictor variables may be indicated if they substantially
>improve the R^2 of the regression and they make sense, indicating that the
>relationship between the response and predictors is nonlinear. For example,
>in predicting longevity from income, I would expect that the going from $20,000
>to $100,000 in annual income has a bigger effect in longevity than going
>from $100,000 to $180,000, so a log or power transformation of income (with
>the power between 0 and 1) would be plausible.
Transformation of the predictor variables is somewhat
safer, but the assumptions need to come from the user,
not for statistical reasons.
>A good book on the subject is "Transformation and Weighting in Regression",
>by Raymond J. Carroll and David Ruppert.
-- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Department of Statistics, Purdue University hrubin@stat.purdue.edu Phone: (765)494-6054 FAX: (765)494-0558
- Next message: Shmuel (Seymour J.) Metz: "Re: Surprising Pattern of Florida's Election Results"
- Previous message: David Jones: "Re: multivariate normal distribution"
- Maybe in reply to: Jo: "When and how to do transformation in multiple regression"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|