Re: Regression with Dichotomous Dependent Variables



On Sun, 29 Mar 2009 11:50:45 -0400, Paul Rubin <rubin@xxxxxxx> wrote:

Rich Ulrich wrote:


What immediately occurs to me is "discriminant function" for two
groups. Isn't the DF model of description appropriate?

Paul >
A discriminant function will predict which of two groups an observation
likely belongs to (here, predicting the choice being made). It won't
necessarily provide a probability estimate, though, whereas a logistic
regression will.

The discriminant function does not give you a direct probability
estimate, and that may be why logistic regression is becoming
more popular in epidemiology. If you ask for case-wise information
from the usual programs, they *will* give you an estimate, among
other information, starting with P(x|group) -- It is possible that
one piece of data does not look like either (or any) of the groups.
That could be why (I think) discriminant function is actively used
in anthropology, including criminal forensics.

When people use logistic regressions for discriminant
analysis (which some do), I think they predict a response of 1 iff the
probability estimate from the logistic regression is > 0.5 (or >= 0.5),
assuming equal priors. (With unequal priors, adjust 0.5 accordingly.)

"Adjust accordingly" is hazardous advice for discriminant analysis.
I don't know how it works out for logistic regression, because i've
never done that with data or paid attention to the theory. But
Setting priors at 2 to 1 for DFA (for instance) results in 90% or
so of cases being assigned to the high-prior group, given the usual
moderate R^2 that I have dealt with, 0.75 to 0.80.


There's also a difference in the underlying distributional assumptions.
The Fisher (linear) and Smith (quadratic) discriminant functions
assume that the predictor variables have a multivariate normal
distribution. Logistic regression makes no assumptions about the
predictors (which can in fact be deterministic).

What do you mean by "determininstic"? - The only thing that comes
to mind is what I've always considered a failure of assumptions,
achieving complete separation owing to too few cases to estimate
coefficients. Logistic regression requires a larger N than does OLS,
in the limit, since it fails more easily.

Also, I've always considered that the difference in "underlying
distributional assumptions" is a matter that is exaggerated.
The goal in either ML Logistic regression or OLS regression is to
create a multi-variable predictor *composite* score. The quality
of that composite score does depend on the characteristics of
its components. The need for *normality* is not at all primary,
since we have same math applied in structured of design of
experiments (with categories) and so on. However, if the predictors
are ordinal but are strongly non-interval, or have extreme
outliers, the quality of the predictor score is brought into question
- in either case - as being a valid, cogent *score*.

LR can swallow up the over-prediction from outliers on one
side, so long as all the cases that are too-strongly predicted
in that direction are, indeed, classified to the proper group.
I don't think that this gives much excuse for starting a
prediction with poor variables.



Not knowing the full context here, it's hard to say if discriminant
analysis would be more or less appropriate, but it's a question worth
raising.

In regression we are actually modeling E[y|x] which is essentially a
mean value given the observed values of X. In the linear probability
model it can be shown that E[y|x] = pr[y|x] which ranges from 0 to 1,
which in effect makes Y continuous and makes regression appropriate.

( of course with linear regression you can get predictions > 1 so I
moved on to the logit model)

Unless you *do* get predictions > 1, which only happens when you
have a rather high R^2, there is very little practical difference
between the models, logistic versus discriminant function. The
logistic is less robust with small Ns.

If you do an OLS regression of the dichotomous response variable on the
predictors, the regression function is a scalar multiple of Fisher's
linear discriminant function, so classification is the same either way.
AFAIK, the regression function is not producing probabilities, though.
It's producing "scores". Clearly values < 0 or > 1 cannot be
interpreted as probabilities, but I'm not sure it's fair to interpret
any output of the regression function as probabilities, even though
ordinarily we would interpret the regression function as a conditional
mean and, as noted above, in this case the conditional mean is a
conditional probability.

If you want to look at something like probabilities, it is probably
safer to look at the sorted list of predictions for both groups,
and look directly at the overlaps. This is the only choice for
DFA, but not for logistic.


Put another way: the linear regression approach violates a cardinal
assumption -- the distribution of the disturbances is now strongly
related to the predictor values. It produces a scalar multiple of the
Fisher function, but I'm not sure it produces a meaningful estimate of
the conditional probability/conditional mean. (I'm also not sure what
happens when the two groups have unequal priors and/or the subsample
sizes are not proportional to the priors. The linear regression slope
vector is still a multiple of the Fisher coefficient vector, but the
constant term in the linear regression may need adjusting, or
equivalently the cutoff for the linear regression output above which you
classify into the 1 group will need adjustment. I'm sure this has all
been worked out long since, but we're getting past my knowledge of such
things.)

/Paul

--
Rich Ulrich
.



Relevant Pages

  • Re: Regression with Dichotomous Dependent Variables
    ... A discriminant function will predict which of two groups an observation likely belongs to ... Logistic regression makes no assumptions about the predictors. ... the linear regression approach violates a cardinal assumption -- the distribution of the disturbances is now strongly related to the predictor values. ...
    (sci.stat.math)
  • Re: Regression with Dichotomous Dependent Variables
    ... assuming equal priors. ... I think a Bayesian would argue that what you're seeing is reasonable behavior when the discriminant function can't really discriminate. ... The Fisher and Smith discriminant functions assume that the predictor variables have a multivariate normal distribution. ... Logistic regression makes no assumptions about the predictors. ...
    (sci.stat.math)
  • Re: Simple Question on Forward and backward regression
    ... through BACKWARD Regression must be reflected in FORWARD Regression as ... I found the most significant Predictor variables ... of stepwise regression, let alone forward/backward stepwise, which I know is flammable in some quarters. ... The F test in the forward direction to decide whether to add X1 to X3+X5 should be the same as the F test in the backward direction to decide whether to turf X1 out of X1+X3+X5. ...
    (sci.stat.math)
  • Re: Simple Q. about co-efficient in coded regression equation
    ... What I meant by Coded regression was, I did "Centering of Predictor" ... Software use "Stepwise Regression by default in Regression Analysis? ...
    (sci.stat.math)
  • sample size for logistic regression (Babyak/Peduzzi papers)
    ... What I have understood is that in logistic regression I need at least 10 observation per predictor variable and the sample size that must be considered is the smallest number between the events and the non-events, for example if I have 1000 observations but only 50 events my sample size is 50 so I "can" use about 5 predictors. ... - a categorical predictor that becomes for example 5 dummy variables counts as ONE or FIVE predictors? ... Level 1: 10 EVENTS 20 NONEVENTS ...
    (sci.stat.math)

Loading