Re: Highest Posterior Density




illywhacker wrote:
Dear Bob,

You are so scared that you may be forced to make a mental
effort and disturb years of rigidified thinking, that you
do everything you can to avoid dealing with Bayesian
methods.

That is not a very good way to start your excuse of my errors
about Bayesian methods. You had once claimed that the only
real Bayesians are the physicists and you learned it from them.
That says it all.


Your technique for doing this is admirably subtle,
however. You pick a particular Bayesian school (one for
which you can, if pressed, use worthless but impressive
arguments based on prestige, since you knew one or more of
the people involved in its founding), and then attack
everyone else for failing to conform to it, or at least
your vision of it. Now of course, those who use Bayesian
methods in practice cannot conform perfectly to such
religious dogma because they deal with the 'real world', as
you call it. In this way you can brand everyone who uses
Bayesian methods as a fake, and thus effectively dismiss
them and safeguard your indolence. Should you use this
technique if you choose to reply to this post, everyone
will notice.

I did not dismiss Robert Schlaiffer, but applauded his effort
to provide tools to make the application of Bayesian methods
more realistic and useful. I did not dismiss MOST of the
Bayesians I know personally. The only exception is Arnold
Zellner who is a "matrix slinger". He takes a multivariate
prior that is mathematically integrable, matched it with a
sample mean, and called it his own prior. That's the mind
of mindless Bayesians I speak out against.

In your case, you are deficient even in the BASICs fo
Bayesian statistics, as evidenced by the explicit exhibits
I showed in my post which you snipped, only to misquote
some in your own words in this post.

Now to your specific points. Your post is so long and
rambling that it is pointless to answer it line by line in
the time-honoured Usenet fashion. Of course, my reply
assumes intellectual honesty about what you intended to
say, something that has been noticeably lacking in your
previous discussions. Still, we can but hope.

Yours points are the following:

-------------------------------
1) Computation of the posterior probability distribution,
or its density, requires calculating the normalizing
constant.

That is patently false. I even specifically said you can
DISCARD the multiplcative factor in the likelihood function
because the posterior distribution needs to be determined
only to a multiplicative factor.

That was against YOUR comment that no integration is
needed because the posterior distribution can be up to a
multiplicative constnat -- for entirely the WRONG REASON.

Next time, you QUOTE me because your misrepresentation
to try to hide your own error.

2) Robert Schlaifer is one of a very few Bayesians who has
been concerned with numerical integration for posteriors.

NOT so much for numerical integration of posteriors but for
the assessent of REALISTIC PRIORS, which necessitated
the numerical integration of the posterior. He wrote an
entire book about his programming work.

3a) No one uses priors that you consider worthwhile in more
than one dimension.

Misquote for the 3rd consecutive time.

3b) All `real world' problems are low-dimensional.

Misquote for the 4th consecutive time. I said it's a DEFECT
of Bayesian Statistics in application that one can only deal
with ones OWN personal beliefs as a prior distribution in one
or two dimensions, and challenge you and anyone else to
show one elicited through Bayesian methods of prior elictation
by self-interrogation (publications by Savage and others) in
three or more dimensions. There is NONE.

4) Ed Jaynes (or anyone with a similar point of view) was
not a Bayesian.

5th consecutive misquote.

He is a theoretical Bayesian whose main contribution is in
his uninformative prior, which I consider a mere evasion of
the solution of how to elicit a REAL Bayesian prior. The
indictment was the use of UNINFORMATIVE prior as an
EXCUSE to pretend one has no information about the
prior when one has plenty of prior information, just don't
know HOW to put it down in a mathematic form.


5) That my points about invariance are wrong, and indeed
mere 'mathematistry'. (Actually, you simply misunderstood
what I was saying. I explain in more detail below.)

6th consecutive misquote of what I wrote. I said nothing
about invariance. I ridiculed at your siily mention of
Riemann and Lebesque in an APPLIED Bayesian
question asked by the OP.


6) That the OP's question concerned trivialities because
"optimization <...> is the least of the problems in
performing any Bayesian analysis and inference".

I clarified that to mean the low dimension that are feasible
for any realistic application. You should NOT have
snipped everything I wrote, and MISREPRESENTED
every point I made.

You made 6 itemized MISQUOTES of 7 items of what I
said!

The rest of your post is just your rationalization of your
own ignorance about Beyesian theory and methods
by replying to YOUR OWN ERRONEOUS positions
about what I said.

Your replies are just as POINTLESS as those in your
original post.

You have NOT correctly represented a SINGLE POINT
I made in my post, let alone refute it. You merely
MISREPRESENTED (by NOT quoting me) my points
for your own convenience of obfuscation and excuse
about your own ignorance about Bayesian Statistics.

-- Reef Fish Bob.

-------------------------------

Now here are my replies.

-------------------------------
A1) Correct. But you do not need to calculate the
normalizing constant if all you want to do is to maximize
the posterior density, which was what the OP wanted to do.
Since performing the integration is often very hard, it
seems pointless to recommend it as a first step when it is
not required. It is rather like saying to someone who
wishes to change the tire on a car, 'first take out the
engine'.

A2) This is incorrect, even absurd. A great deal of
Bayesian research is devoted to numerical methods for
integration.

A3a) Since we do not need to accept your particular version
of `Bayesian', we do not need to pay attention to this
point.

A3b) Again, this is incorrect, even absurd. There are many
problems in the 'real world' as you call it, where the
space of parameters has a very large number of dimensions.
Signal processing provides a large class of examples, but
there are many others.

A4) This is merely a matter of definition and hence in
itself is not worth considering. However, what is worth
considering is that although the incorporation of prior
knowledge is characteristic of Bayesian approaches, the
most characteristic point, the one that makes the most
difference, is rather the acknowledgement that
probabilities have nothing to do with randomness, and that
therefore probabilities for hypotheses can be considered,
even though the hypothesis is not what an orthodox
statistician (such as yourself) would call a 'random
variable'.

A5) I am as against mathematistry as you are, Bob, but the
problem of which I speak remains. You ask why one needs a
density function (note I did not say probability density
function - normalization is indeed irrelevant). The problem
is that Bayes' theorem does not give you a function; it
gives you a measure, e.g.

d\theta f(\theta) .

To maximize you need a function, not a measure, so you have
to divide by another measure. Most of the time, people
simply drop the d\theta and maximize f (i.e. divide by the
measure d\theta). But suppose a second person was using a
different coordinate to described the parameter, e.g.
\alpha = \theta^{2}. They would construct the same
posterior measure in the different coordinates:

d\alpha g(\alpha) ,

where f(\theta) = 2\theta g(\theta^{2}). Now this person
follows the same procedure, drops d\alpha, and maximizes g.
They will therefore not find the same estimate, in the
sense that \alpha* = \theta*^{2}. Clearly, however they
should find the same estimates in this sense, as all their
information is the same.

A6) Had you ever addressed a high-dimensional inference
problem, you would never suggest that optimization is the
least of the problems involved. However, I do agree that
built-in functions in Matlab or Mathematica are not the
best way to address these problems, as I said in my post.

illywhacker;

.



Relevant Pages

  • Re: The danger of classical hypothesis and significance tests [was Re: MADLY AMUSED]
    ... Given the pained and heartfelt way in which Jaynes likened himself to Galileo going up against the classical orthodoxy, I would have thought a Bayesian such as you would refrain from taking up the sort of position you here do, namely that of self-appointed Defender of the Faith, taking aim at what you too quickly deem to be misguided crackpots. ... The problem, which you pretend not to see or understand, is that which has beset the Bayesian approach from the very beginning -- namely the justification, or lack, for the notion of prior. ... the only question at issue is what calculus should be applied to the likelihood function for purposes of marginalization and of change of variable. ... The essential insight derives from fuzzy set theory, which is to recognize that the likelihood function minimally satisfies the conditions of the membership function of a fuzzy set, and more fundamentally that uncertainty in a model parameter is of a different sort than uncertainty in the next occurrence of a random variable. ...
    (sci.stat.math)
  • Re: Reference for Savage/Berger
    ... >> soften myself up a bit before reading more central papers. ... >computational problems of obtaining the posterior distributions from ... This is overused as a reason for not being a Bayesian. ... one cannot separate the "prior" from the "loss". ...
    (sci.stat.math)
  • Re: Reference for Savage/Berger
    ... >> soften myself up a bit before reading more central papers. ... >computational problems of obtaining the posterior distributions from ... This is overused as a reason for not being a Bayesian. ... one cannot separate the "prior" from the "loss". ...
    (sci.stat.edu)
  • Re: Highest Posterior Density
    ... You need to be able to supply the Bayesian ingredients of a prior ... obtain your posterior distribution of your PARAMETER ... you do not need to perform the integration as the ...
    (sci.stat.math)
  • Re: Bayesian continued and shuffling
    ... From a Bayesian point of view did I not use prior knowledge? ... An observation about hand shuffling: ... For the record I try not to adjust and keep using pure probability ...
    (rec.gambling.poker)