Re: Demonstration that least squares give maximum likelihood
- From: "Daddy Tadpole" <nobody@xxxxxxxxxx>
- Date: Tue, 21 Oct 2008 14:41:59 +0200
Many thanks for taking the trouble to reformulate the question, so I now think I know what I wanted to know.
In case anyone else is listening I'll flesh out some details for chemical/pharmaceutical analysis (quality control).
"Brian Borchers" <borchers.brian@xxxxxxxxx> a écrit dans le message de news:addef96b-f296-4763-914d-4e8bc2f55ad9@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>It isn't; I know that much; I just want to know if there exists a
comprehensible demonstration / outline proof that the most probable
result is obtained by minimising the least squares.
Perhaps a concrete example will help to clarify your question. At
what
stage in the following do you have questions?
1. Suppose that we have an object of mass m that we will weigh
repeatedly. Call the measurements m(1), m(2), ..., m(n).
That's fine for this question.
In practice we weigh multiple aliquots of a standard and the substance being examined, measure the signal from an instrument and calculate the assay using the signal from the standard (single point standardisation or (inverse) regression analysis).
During method validation, we determine variability (repeatability) between operators, balances, instruments, labs, etc.
Sample inhomogeneity is suspected if scatter of the results exceeds historical or permitted levels. This is critical for the safety and efficacity of finished pharmaceutical products and there's a protocol in the Pharmacoeias for evaluating this.
2. Assume that each of the measurements is independent of the other
measurements. This assumption can easily be relaxed but it probably
isn't the source of your confusion, so we'll go ahead and make that
assumption.
We can make that assumption. If, during validation, we make repeated measurements on an aliquot, we should normally take the average of the measurements in order to maintain independence for subsequent calculations.
3. Assume that each of the measurements is normally distributed with
mean m and standard deviation sigma. The assumption that each
measurement has the same standard deviation can easily be relaxed, but
again it's a minor technical point that probably isn't the source of
your confusion.
It is generally accepted in our field that you can usually assume normality, partly because the errors have multiple causes.
Homogeneity of variance can be a problem. For example the dominant source of random variation can be in the supposedly constant volume delivered by an automatic sampler (so it's the relative standard deviation that tends to be constant). For the case I'm thinking of (assay), you can restrict the working range appropriately. Other times you know that inhomogeneity leads to overestimating the uncertainty of the result if you don't use weighting, and this may be acceptable because you're erring on the side of caution.
Outliers also have multiple known causes, but the cause of a given incident is generally unknowable retrospectively. Examples: the free-floating discharge in the lamp of a spectrometer can flicker for a few seconds only once every few days on average, a valve being turned may leak just occasionally, you get the odd bubble in a liquid flow system. The pharmacopeias deal with the problem by fixing rules for analysing additional samples; also, these arrangements partly get round the problem that (usually) 5% of initial results should be falsely out of specification because of random analytical errors.
Another responder alluded to concerns that - outliers apart - there are suspicions that the tips of the tails of real analytical data distributions are bigger and longer than the tails of the normal distribution. That's a separate research subject.
If you drop the assumption of normality, then all of the details of
the
following calculations will change, because they depend on the form of
the normal distribution.
4. The probability density for a single mesaurement m(i) is
f(m(i)) propto exp(-(m-m(i))^2/(2*sigma^2))
By "propto" I mean "is proportional to." I've left out a scaling
factor of 1/(sigma*sqrt(2*pi)) that won't be important. This is
simply a normal probability distribution.
5. The probability density for the entire collection of measurements
is
f(m(1),...,m(n)) propto exp(-sum((m(i)-m)^2/(2*sigma^2),i=1..n))
Here m(1), m(2), ..., m(n) are random variables and m is a fixed (but
unknown to us) constant. All that I've done is taken the product of
the
probability distributions for the individual measurements, and
combined
the exponents. Conveniently, the propto hides the scaling factors.
6. Given a fixed collection of measurements m(1), m(2), ..., m(n),
define the likelihood of an estimate, mest, of m to be
L(mest) propto exp(-sum((m(i)-mest)^2/(2*sigma^2),i=1..n))
This is almost but not quite the same as the expression in step 5.
Note that the m(i) are no longer random variables- they're now fixed
data- the numbers that you actually got when you did the
measurements. Furthermore, mest is a parameter that we will be
fiddling with, not the (unknown) true mass m.
L(mest) is called the "likelihood" of mest.
This is the one: I take it you are saying that the Gaussian function gives the likelihood of an estimate, and consequently (below) the maximum likelihood. The sticking point is that the derivations of this function I have seen are too complicated and abstract for someone in my position.
Would it be valid and sufficient for a mathematically challenged audience to demonstrate by examples (without proof) that the binary distribution with p close to 0.5 (coin tossing) and a reasonably large value for n is a good approximation to the Gaussian function? I can cope with the notion of shifting from discrete to continous variables.
7. The maximum likelihood principle tells us to pick the estimate,
mhat, that maximizes L(mest). It should be obvious (take the
logarithm) that this maximum is obtaining by minimizing
min sum((m(i)-mest)^2/(2*sigma^2),i=1..n)
or just
min sum((m(i)-mest)^2,i=1..n)
8. The solution to this optimization problem is (by differentiating
with
respect to mest and setting the derivative equal to 0.)
mhat=sum(m(i),i=1..n)/n.
7 and 8 are nice and clear. Is it correct, then, to state that we have *deduced* that the mean is the most likely value, a result that is fundamentally (if you've never heard of calculus) the conclusion of an iterative thought process, and not necessarily intuitive? This matters, because there also happens to be an analytic solution for the linear least squares fit to a straight line (many calibration curves), but that if your calibration function is more complicated you or your computer have to resort to iteration.
Again, many thanks for your help
Regards
.
- References:
- Demonstration that least squares give maximum likelihood
- From: Daddy Tadpole
- Re: Demonstration that least squares give maximum likelihood
- From: RichUlrich
- Re: Demonstration that least squares give maximum likelihood
- From: Daddy Tadpole
- Re: Demonstration that least squares give maximum likelihood
- From: Brian Borchers
- Demonstration that least squares give maximum likelihood
- Prev by Date: Re: Demonstration that least squares give maximum likelihood
- Next by Date: €10 for sportbeting no deposit
- Previous by thread: Re: Demonstration that least squares give maximum likelihood
- Next by thread: €10 for sportbeting no deposit
- Index(es):