Re: Calculate the entropy using mu and sigma?



No, you're right. The result I alluded to only holds for continuous
distributions (possibly only for differentiable ones). In other
words, a Gaussian is the maximum entropy *continuous* distribution
subject to a given mean and variance. My fault for being sloppy.
Thanks for pointing it out, Daniel.

Michael

Wikipedia has the same mistake here:

http://en.wikipedia.org/wiki/Maximum_entropy_probability_distribution

It's not just a "sin of omission," as it specifically states that it
is maximal for "all distributions on the real line." Plus, it comes
immediately after a definition of discrete distributions! I'd fix it
myself, but as I don't know what the proper class of functions is I
can't replace it with a correct statement. Anyone out there up to the
task?

It's shown using the calculus of variations using Lagrange
multipliers. I'm by no means an expert, but the calculus of
variations article states (at the bottom of the section "The Euler-
Lagrange equation") that the function f (in this case the probability
density function) "is required to have two continuous derivatives." I
think that's the correct statement, i.e., that among all distributions
with two continuous derivatives, the normal distribution is a maximal
entropy distribution, subject to given mean and variance.

In case you're interested, the derivation goes something like this:

Using the calculus of variations (link) and Lagrange multipliers
(link), maximize the entropy (integral(-inf, +inf) -f(x)log f(x))
subject to the constraints
integral(-inf,inf) f(x) dx = 1 (i.e., probability distribution is
well-formed)
integral(-inf,inf) xf(x) dx = A (mean)
integral(-inf,inf) x^2f(x) dx = B (variance -- actually, second
moment, but variance follows from it)

Using Lagrange multipliers, this is equivalent to maximizing
integral(-inf, +inf) -f(x) log f(x) + l1 f(x) + l2 x f(x) + l3 x^2
f(x) dx
= integral(-inf, +inf) L(x, f, f') dx
subject to the constraints, where l1, l2, l3 (or lambda1, etc.) are
unknown constants.

The calculus of variations proceeds by pretending f(x) = f is
independent of x, and f' is independent of f and of x, and using the
Euler-Lagrange equation
- d/dx pd L/pd f' + pd L/pd f = 0
(where pd = partial derivative; sorry for the typography)

Working this out:
[ - d/dx 0 ] + [- (f * 1/f + log f) - l1 - l2*x - l3*x^2] = 0
-log f - (l1 - 1) - l2*x - l3*x^2 = 0
f = exp(- (l1 - 1) - l2*x - l3*x^2)

And working out l1, l2 and l3 from the constraints gives a Normal
distribution with the given mean and variance. QED.

Michael

P.S. The derivations for uniform distribution and for exponential
follow the same logic, just different constraints and different limits
of integration.
P.P.S. Daniel, please feel free to add the appropriate caveat to the
Wikipedia article. Unlike Weird Al, I have never editted Wikipedia,
and actually don't know how.

.



Relevant Pages

  • Re: Measuring Turquoise Underwear
    ... that the distribution had to be normal. ... 6/49 game and has stats for about 52 draws. ... he claims the variance for the mean is /12n. ... The revised formula would yield ...
    (rec.gambling.lottery)
  • Re: feedback...
    ... >>>Hi Duncan, ... >>mean (from N sample draws) to fall with 95% confidence. ... The variance of the mean, after N draws, for a given position is ... the variance of the distribution from which a draw is made. ...
    (rec.gambling.lottery)
  • averaging noisy data (was: Re: Spacecraft earth-flyby data reveals dynamical preferred frame)
    ... filtered data contains much less noise than the raw data, ... The obvious thing to do is to average our N measurements by defining ... What can we say about the probability distribution of xbar? ... We'd like to say that the variance of xbar's distribution is about ...
    (sci.physics.research)
  • Re: Questions about a distribution
    ... Let's say the PDF has the mean = 250 ... The variance tells how ... units as the reaction time measurements, so I would have said that the SD ... lets you know how "wide" the distribution is. ...
    (sci.stat.math)
  • Re: Need Help Determining the "True" Mean of a Sample
    ... > I'm a software engineer, not a statistician, so please forgive my ... > The distribution for these samples is such that about 75% of the ... only 4% as much of the total variance. ... of dropping immediately to zero when the N is under 58. ...
    (sci.stat.math)

Quantcast