Re: Enter versus forward method for linear regression



On 20 Jun 2006 15:43:26 -0700, "Jem" <jomilton@xxxxxxxxxxx> wrote:

Hi

I am fairly new to regression and have so far always used the enter
method, grouping certain blocks of variables. I am generally trying to
establish relationships between the dependent variable and a particular
independent, so only adding other variables that might confound the
relationship. I am not specifically trying to establish the model that
best predicts the dependent.

Googling groups, < group:sci.stat.* model-building > yielded,
among other things --

1. Statistics for Experimenters: An Introduction to Design, Data
Analysis, and Model Building, by George E.P. Box, William G. Hunter
and J. Stuart Hunter. ISBN:0-471-09315-7. Published by Wiley.

2. Applied Linear Statistical Models: Regression, Analysis of Variance
and Experimental Designs, by John Neter, William Wasserman and Michael
H. Kutner. ISBN: 0-256-08338-X. Published by IRWIN.

Also, Judd/McClelland's book on "Data Analysis."
Also, Frank Harrell's book, "Regression Modeling Strategies."


Stepwise selection is not what you want, from what you
describe. Few people should want it. You can check my
stats-FAQ for some old posts, or Google.


I am doing my thesis at the moment and it has been suggested that I
present the coefficents and p values of all predictors so that readers
can make their own minds up about the strength of relationships. I
have recently tried the forward method, so that I don't end up with so
many predictors (all theoretically related to the dependent) but many
of which are not significant predictors. However, as far as I can
gather this does not then provide you with coefficients and p values
for all variables and does not allow you to see what happens to the
coefficent of the primary predictor on adding additional predictors.
Please correct me if I am wrong.

Two problems with too many variables --
- You can run out of degrees of freedom and have far too much
capitalization on chance, if your sample is not large enough.
- For making sense, you need to have a good notion of what
the variables are supposed to mean, and how that compares
to what they actually *measure*. That can be a burden when
there are many.

You do want to test what the literature suggests is important.
It can be useful to show one variable in several contexts.

It is often useful to look at what is added by specific *sets*
of variables. Also, SPSS (for one) has a useful option for looking
at tests, as if the variable was entered next, on all the variables-
not-in-the-equation.

Also, try Robert Abelson's book "Statistics as Principled Argument."


Additionally if my hypothesis is that dependent variable, a for example
is affected by c (my primary predictor of interest) via a change in
another predictor, b then should I be adjusting for b by adding it to
the model as surely this will preven me seeing an affect of c on a. I
hope that makes sense. My current idea is to stick with the enter
method that I know best, then I can add b in a seperate block from c
and examine the effects on the coefficients. This also allows me to do
the regression easily with or without variable c.

Any help gratefully received, I will prob have a few more questions
once I get responses to these.

It sounds to me like you are starting out in the right direction.

--
Rich Ulrich, wpilib@xxxxxxxx
http://www.pitt.edu/~wpilib/index.html
.



Relevant Pages

  • Re: APS regression
    ... >For years I've been telling people to do all-subsets regression and to ... >predictors are never there in the better subsets. ... Herman Rubin, Department of Statistics, Purdue University ...
    (sci.stat.consult)
  • Enter versus forward method for linear regression
    ... I am fairly new to regression and have so far always used the enter ... present the coefficents and p values of all predictors so that readers ... coefficent of the primary predictor on adding additional predictors. ... and examine the effects on the coefficients. ...
    (sci.stat.edu)
  • Re: Multicollinearity !!!!!
    ... I agree they're correct for the most part (the caveat being that multicollinearity can make X'X stiff, so the inverse can get larded up with excessive rounding error -- but that's probably only a concern in somewhat extreme cases?). ... At the same time, they're prone to being misleading, in that you can stare at the t ratios for a bunch of coefficients and conclude, module Bonferroni concerns, that none of the terms in question is significant, when in fact some of them would be significant if others were made to go away. ... You're tapping into a concern I express annually in my regression seminar for future data abusers: that most of the "advanced techniques" for coping with problems can be viewed as ways of papering over a hole in the model. ... That said, you omitted my favorite solution to multicollinearity concerns, and one that I think is generally valid: get rid of one or more predictors. ...
    (sci.stat.math)
  • Re: stepwise regression by GENSTAT
    ... My handbook considers only stepwise regression as a method to select ... leaving behind only "random variation" in the residuals (residuals = ... to which subset of these to use as predictors. ...
    (sci.stat.math)
  • Re: Questions about square errors
    ... Take a look at the 10X10 correlation coefficient matrix and the ... multicollinearities. ... least squares and/or multiple regression. ... Your model may have several unnecessary predictors. ...
    (sci.stat.math)