Re: Logistic Regression
- From: Graham Jones <graham@xxxxxxxxxxx>
- Date: Fri, 27 May 2005 13:23:46 +0100
In article <1117188427.541938.211140@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>,
clemenr@xxxxxxxxxx writes
[...]
>Now, I think can see why the weights would go to infinity if the
>problem is too easy, as the instances are labelled 1 and 0 and,
>assuming infinite numerical precision, then likelihood will always
>increase the further the probabilities of the instances are pushed
>towards extreme values, and reaching 1 and 0 requires infinite weights,
>no? In the case where the classes cannot be perfectly separated, the
>decreasing increase in likelihood obtained by pushing the probabilities
>of the correctly classified instances towards the extremes will
>eventually be outweighed by even a few misclassified instances, no?
I'll let Ray explain his table, but you've got the basic point now. The
way I'd put it is that linearly separable means there a vector a and
scalar b so that a.x+b is always negative when x comes from one class
and positive when x comes from the other. (a and b define the separating
hyperplane.) Your probabilities are of form (1+exp(a.x+b))^{-1} so if
such a,b exist, you can make the probabilities closer to 0 or 1 by
multiplying a and b by a number bigger than 1.
>I do get good fits sometimes for more than 9 attributes. However, my
>program does give up when the change in weights is less than a
>constant. As the derivative of the logistic curve gets closer and
>closer to 0 as the probability nears 1 or 0, I presume that the changes
>in the weights will also slow. Hence, my program may cease iterations,
>and return a set of weights that haven't reached maximum likelihood.
I think the change in weights should be fast with unlimited accuracy,
but with limited accuracy there will come a point where
(1+exp(a.x+b))^{-1} is indistinguishable from 0 or 1.
>I
>also presume that where my program goes crazy and starts assigning
>silly probabilities when it was "almost there" is due to floating point
>overflow in the matrix algebra. If that is the case, then it might be
>possible to prevent this by putting a strict limit on the absolute
>values of weights. This of course would mean that I'm not going to get
>the maximum likelihood fit, but that may be better than no fit at all.
Yes. You don't really want the ML fit anyway. There are various ways of
dealing with the situation, and I'm not sure which is most suitable for
you. A prior on the weights, or jitter or weight decay as used in neural
nets, and no doubt others.
--
Graham Jones
http://www.visiv.co.uk
Emails to graham@xxxxxxxxxxx may be deleted as spam
Please add a j just before the @ to ensure delivery
.
- References:
- Logistic Regression
- From: clemenr
- Re: Logistic Regression
- From: Phil Sherrod
- Re: Logistic Regression
- From: clemenr
- Re: Logistic Regression
- From: Ray Koopman
- Re: Logistic Regression
- From: clemenr
- Re: Logistic Regression
- From: Ray Koopman
- Re: Logistic Regression
- From: clemenr
- Logistic Regression
- Prev by Date: Re: Logistic Regression
- Next by Date: Re: - explaining tails to a non-statistician
- Previous by thread: Re: Logistic Regression
- Next by thread: Re: Logistic Regression
- Index(es):
Relevant Pages
|
|