Re: Highest Posterior Density
- From: "Reef Fish" <large_nassua_grouper@xxxxxxxxx>
- Date: 7 Oct 2006 08:14:04 -0700
illywhacker wrote:
Reef Fish wrote:
AB wrote:
how can I find an HPD using a statistical software (e.g. Mathematica)?
By HPD do you mean VALUE of the variable that corresponds to
the highest point of the posterior density in a problem using a
Bayesian method of inference?
You need to be able to supply the Bayesian ingredients of a prior
distribution for your parameter, be able to perform the integration to
obtain your posterior distribution of your PARAMETER
Actually, you do not need to perform the integration as the
normalizing constant does not depend on the parameter you
wish to estimate, and hence does not affect the
maximization.
Your comments in this post clearly indicated that: (1) You are NOT
a Bayesian; (2) You don't know what Bayesian Statistics is about;
(3) you've NEVER assessed a proior distribution that reflects your
own opinionor belief about a parameter; and (4) you have never
carried out a Bayssian analysis in an applied and realistic way.
Your comments are of the type that might impress someone like
yourself who dosn't know anything about Bayesian statistics, and
all they know are some buzz words in the discipline. I'll be explcit
about the various ERRORS of yours in this post, about your use
of Bayesian buzz words.
Your first MAJOR error is in the paragraph above, about "you do
not need to perform the integration". Except for the use of
conjugate priors (which are seldom realistic for the personal prior
of anyone) or diffuse priors in stable estimation when the likelihood
function is sharp, you always need to integrate the product of the
prior and the likelihood function to get the posterior distribution.
While it is true that the posterior distribution needs to be known
only up to a proportionality constant (which is why the proportionality
constants in a joint likelihood function can be discarded before the
integration), but you always have to integrate! That's why Robert
Schlaifer wrote an entire BOOK providing numerical methods for
assessing (AND INTEGRATING) mostly univariate prior distributions
that are NOT of the conjugate type but are needed to reflect one's
true prior beliefs.
http://links.jstor.org/sici?sici=0162-1459(197312)68%3A344%3C1023%3ACPFEDA%3E2.0.CO%3B2-%23
Computer Programs for Elementary Decision Analysis
He is one of a very few Bayesians who has even done that, and I
applauded him heartily in my review of his book for JASA.
Beyond ONE dimention, I have not seen any Bayesian carefully
construct or argue why his prior distribution is represented by the
mathematical form for the parameter, in a bivariate case, or one
in a higher dimension. Those who do that kind of "Bayesian
Statistical Analysis" are mostly non-Bayesians, pseudo-Bayesians,
or just "mathematical statisticians" who can sling a few matrices
around without having the slightest idea of what they are doing, in
terms of representing his own PRIOR beliefs!
Thus, my comments to the OP were general (for ALL dimensions)
but sufficiently non-technical to cover all problems, but thinlking
mostly in a posterior distribution of ONE (perhaps two) dimensions.
Most statistical
software should have some method for finding the maximum of a
function -- that part should be easy.
Whether the optimization is easy very much depends on the
dimensionality of the parameter space on which the density
is defined, and on the complexity of the density itself. It
can be an almost impossible task to find a global (as
opposed to local) maximum in a high-dimensional space,
hence the large amount of research devoted to this topic.
Give me an example of a THREE dimensional PRIOR
distribution used by any Bayesian, that is of the kind that is
expressed in a mathematical form that requires a careful
elicitation to arrive that prior, and invariably would require
numerical integration because the product of the prior and
likelihood function is not in a form integrable by usual
mathematical integration methods. You can't even find
a TWO dimension PRIOR of that kind without the user
resorting to a conjugate prior because that's the ONLY
form that can be easily integrated, but unfortunately NOT
because they argued the form realistically reflected their
own prior.
THIS are the basic elements few Bayesian ever struggled
past to make any useful or realistically person Bayesian
inference. That's the inherent DEFECT of Bayesian
statistics -- the difficult in the EXECUTION, though perfect
in theory and foundation of how statistics should be done.
There is also a conceptual difficulty with MAP (HPD)
estimation. While Bayes' theorem supplies you with a
measure on the parameter space, in order to perform the
maximization you need a function, i.e. a density.
WHY would you need a density function? I thought you had
understood that a posterior distribution has to be known only
up to a proportionality constant. The posterior distribution,
whether it is a genuine density function (integrable to 1) or
just proportional to a density function, the MAXIMUM of the
function (or the parameter) is generally easy to find. For
one or two dimensions a many discrete search methods in
numerical analysis would apply. The function does not even
need to be continuous or integrable.
What follows below is complete utter nonsense of
"mathematistry". The finding of a maximum will be in the
Euclidean one- or two-dimensional space of the REAL WORLD,
where measure and integration does not come into play --
not even calculus is necessary.
You
therefore need an underlying measure with respect to which
you can define the density of the posterior. The problem is
that different underlying measures give rise to different
estimates. (A similar thing happens with MMSE estimates
too.) Since the most common procedure is simply to drop the
'd\theta' symbol in the density (\theta is the parameter),
i.e. to use 'Lebesgue measure' in the current coordinate
system on the parameter space as the underlying density,
different choices of coordinates, e.g. using \theta^{2}
instead of \theta for a positive parameter, lead to
different estimates. In order to get around this, it is
necessary to choose an underlying density that is invariant
to the choice of coordinates on the parameter space.
The most sensible choice is to use a Riemannian metric to
define the underlying measure, since a metric generates
coherent notions of volume (MAP estimates) and distance
(MMSE estimates). The question is then, how to choose the
metric? If we are just given a density on some space, there
seems to be no good way to do this. However, in the context
of Bayes' theorem, there are good arguments for choosing
the Fisher information metric constructed from the
likelihood, as this is the only choice that does not
introduce extra information. In the case that the prior is
Jeffreys' prior (the volume element of the Fisher
information metric), this leads to maximum likelihood as
the estimation method.
Priors based on Fisher's information matrix, or Jeffrey's and
other kinds of uninformative prior, are precisely the kind of
EXCUSES used by non-Bayesians and pseudo-Bayesians
to justify certain aspects of NON-Bayesian inference, such
as that of MLE, by arguing that they don't know ANYTHING
about the parameters on which they are making statistical
inference.
That is the opposite extreme of a TRUE Bayesian who
incorporates whatever he knows about the parameters (in
his prior distribution) so that his inference depends not only
on the likelihood function from the DATA, but how it blends
in with his own prior information or beliefs.
That is what Bayesian Statistics is all about.
It is NOT an exercise in "Mathematistry" or pretending that
one knows NOTHING about the parameters.
To return to the OP's question: the first thing to know is
that there is no algorithm that will guarantee to find the
global maximum for an arbitrary function. The difficulty of
your problem largely depends first, on the dimensionality
of your space, and second, on the complexity of your
density.
You continue to bark up the wrong tree.
Matlab has an fminsearch function that looks for an
unconstrained minimum of a multivariable function, but may
only return a local minimum. (Obviously finding a maximum
is the same as finding a minimum of the negative of the
function.) The Matlab Optimization toolbox provides
fminunc, which addresses the same problem, as well as other
optimization tools.
Mathematica has the functions NMinimize and FindMinimum,
with similar goals.
These are all standard mathematical methods in optimization
in NUMERICAL ANALYSIS that is the least of the problems in
performing any Bayesian analysis and inference.
There is no real substitute, though, for analysing your
function and trying to find out as much as you can about
it, and then writing your own code. Many optimization
algorithms are not hard to implement (gradient descent,
simulated annealing), and can often provide good results
with all your assumptions known, because you wrote the
code.
illywhacker;
More buzz words in numerical analysis and numerical
optimization, while MISSING the entire BOAT of Bayesian
Statistical inference and analysis.
-- Reef Fish Bob.
.
- Follow-Ups:
- Re: Highest Posterior Density
- From: illywhacker
- Re: Highest Posterior Density
- From: lesliecaryn
- Re: Highest Posterior Density
- From: lesliecaryn
- Re: Highest Posterior Density
- References:
- Highest Posterior Density
- From: AB
- Re: Highest Posterior Density
- From: Reef Fish
- Re: Highest Posterior Density
- From: illywhacker
- Highest Posterior Density
- Prev by Date: Lilliefors´s Test : FRESH critical values
- Next by Date: M/M/1 waiting time, basic question
- Previous by thread: Re: Highest Posterior Density
- Next by thread: Re: Highest Posterior Density
- Index(es):
Relevant Pages
|