Re: a question about maximum likelihood estimation (MLE)
- From: "illywhacker" <illywacker@xxxxxxxxx>
- Date: 9 Mar 2007 15:05:06 -0800
On Mar 5, 4:38 am, "Joe Zhang" <jingq...@xxxxxxxxx> wrote:
Suppose we know a certain transformation of x, say y = theta * x^3,
satisfy a normal distribution N(beta, 1), where theta and beta are two
parameters to be estimated.
We have two independent observations x1 and x2. That is, there are two
observations for y: y1 = theta * x1^3 and y2 = theta * x2^3. Now we
want to use the MLE method to estimate the two parameters.
Denote the pdf of y as p_y(y) and that of x as p_x(x). We have p_x(x)
= p_y(theta * x^3) * 3*theta*x^2.
One way to estimate theta and beta is to maximize p_x(x1) * p_x(x2):
max_{theta, beta} p_x(x1) * p_x(x2)
= max_{theta, beta} p_y(theta * x1^3) * p_y(theta * x2^3) *
3*theta*x1^2 * 3*theta*x2^2
The other way is to maximize p_y(y1) * p_y(y2):
max_{theta, beta} p_y(theta * x1^3)
Obviously, these two methods are different because of the addition
term, 3*theta*x1^2 * 3*theta*x2^2, in the first method. Intuitively, I
know this is because an additional term (the derivative of y with
respect to x: 3*theta*x^2) is added to generate p_x(x) from p_y(y).
This term, however, does not appear when we transform from x1, x2 to
y1, y2.
In fact, if both x and y are discrete random variables, we simple use
p_x(x) = p_y(theta * x^3), instead of p_x(x) = p_y(theta * x^3) *
3*theta*x^2 -- here, p_x(x) and p_y(y) are probability mass
functions. In this case, the two methods above lead to the same
result.
But, how can we explain the difference of two methods when x and y are
both continuous variables?
First, forget beta, as it is irrelevant to the problem.
The confusion arises because you are taking two different limits
without realizing it. Naturally you get two different results. If you
discretized the problem, you would see immediately what was wrong.
In one case, you are using the probability that the data point lies in
the interval [x, x + epsilon]. In the other, you are using the
probability that the data point lies in the interval [y, y + epsilon],
which is equivalent to it lying in the interval [x, x + delta], where
delta = epsilon / (3 theta x^{2}) .
If this second interval did not depend on theta, then the two limits
would be the same. Since it does depend on theta, they cannot be the
same for all theta, and this is the problem.
The fact that the distribution for y does not depend on theta has
nothing to do with this effect. It merely simplifies the problem.
Although the choice of case (and there are an infinity of others) will
depend on what these quantities actually represent, it seems likely
that if you are given the data in x coordinates, you would want to use
the interval [x, x + epsilon], while if you are given the data in y
coordinates, you would want to use [y, y + epsilon]. In your
particular case, the latter would not allow you to estimate theta at
all, as the probability that the data point lies in [y, y + epsilon]
does not depend on theta.
The lesson from this is to be wary of limits. If in doubt, discretize
and take the limit at the end of the calculation. (What is the limit
of x^{2} / x as x -> 0? If you take the limits now, you get 0 / 0,
nonsense. If you divide first, you get something sensible.)
For more on this issue, and more examples of what can go wrong when
you ignore this advice (which was the advice of Gauss too), see
'Probability theory: the logic of science' by E. T. Jaynes.
illywhacker;
.
- References:
- a question about maximum likelihood estimation (MLE)
- From: Joe Zhang
- a question about maximum likelihood estimation (MLE)
- Prev by Date: Re: a question about maximum likelihood estimation (MLE)
- Next by Date: online course: introduction to R
- Previous by thread: Re: a question about maximum likelihood estimation (MLE)
- Next by thread: Levenberg Marquardt related question
- Index(es):
Relevant Pages
|
|