# Re: Regression with Dichotomous Dependent Variables

• From: Paul Rubin <rubin@xxxxxxx>
• Date: Sun, 29 Mar 2009 11:50:45 -0400

Rich Ulrich wrote:

What immediately occurs to me is "discriminant function" for two groups. Isn't the DF model of description appropriate?

A discriminant function will predict which of two groups an observation likely belongs to (here, predicting the choice being made). It won't necessarily provide a probability estimate, though, whereas a logistic regression will. When people use logistic regressions for discriminant analysis (which some do), I think they predict a response of 1 iff the probability estimate from the logistic regression is > 0.5 (or >= 0.5), assuming equal priors. (With unequal priors, adjust 0.5 accordingly.)

There's also a difference in the underlying distributional assumptions. The Fisher (linear) and Smith (quadratic) discriminant functions assume that the predictor variables have a multivariate normal distribution. Logistic regression makes no assumptions about the predictors (which can in fact be deterministic).

Not knowing the full context here, it's hard to say if discriminant analysis would be more or less appropriate, but it's a question worth raising.

In regression we are actually modeling E[y|x] which is essentially a
mean value given the observed values of X. In the linear probability
model it can be shown that E[y|x] = pr[y|x] which ranges from 0 to 1,
which in effect makes Y continuous and makes regression appropriate.

( of course with linear regression you can get predictions > 1 so I
moved on to the logit model)

Unless you *do* get predictions > 1, which only happens when you
have a rather high R^2, there is very little practical difference between the models, logistic versus discriminant function. The logistic is less robust with small Ns.

If you do an OLS regression of the dichotomous response variable on the predictors, the regression function is a scalar multiple of Fisher's linear discriminant function, so classification is the same either way. AFAIK, the regression function is not producing probabilities, though. It's producing "scores". Clearly values < 0 or > 1 cannot be interpreted as probabilities, but I'm not sure it's fair to interpret any output of the regression function as probabilities, even though ordinarily we would interpret the regression function as a conditional mean and, as noted above, in this case the conditional mean is a conditional probability.

Put another way: the linear regression approach violates a cardinal assumption -- the distribution of the disturbances is now strongly related to the predictor values. It produces a scalar multiple of the Fisher function, but I'm not sure it produces a meaningful estimate of the conditional probability/conditional mean. (I'm also not sure what happens when the two groups have unequal priors and/or the subsample sizes are not proportional to the priors. The linear regression slope vector is still a multiple of the Fisher coefficient vector, but the constant term in the linear regression may need adjusting, or equivalently the cutoff for the linear regression output above which you classify into the 1 group will need adjustment. I'm sure this has all been worked out long since, but we're getting past my knowledge of such things.)

/Paul
.