Re: different priors (flat, uniform, etc)




Russell wrote:
On Oct 29, 3:52 pm, "Reef Fish" <large_nassua_grou...@xxxxxxxxx>
wrote:
DZ wrote:
Anon. <bob.oh...@xxxxxxxxxxxxxxxxx> wrote:
Reef Fish wrote:
David Winsemius wrote:
"Reef Fish" wrote
Can we hear a bit more about how is Beta(1,1) is an informative prior for a
binomial problem?

It CHANGES the likelihood function to form the posterior distr.My one-line response turned out to be more succinct and penetrating
than I had thought, because they is the KEY to any PROPER prior
that is informative!

[...]
Intriguingly, Reef Fish also made this comment on this thread:
RF> The posterior distribution is the likelihood function if the prior
RF> is "diffuse" (which is NOT the same as a "uniform" or "flat" prior).

So, apparently the beta(1,1), which is also the uniform distribution, is
"diffuse" but not "uniform".No, the uniform distribution is hardly diffuse. It is uniform AND
informative,
as I had said before.

I was out of town on the weekend. I sort of took advantage of that to
see what Bayesians I can flush out of the wood work, to show, without
any doubt, that John Uebersax and Anon Bob O'Hara were definitely
NON-Bayesians and that they were completely wrong, as I had indicated
with the few hints I gave.

The first one to surface was Herman Rubin, who mentioned some points
others followed up on, but Herman misunderstood the statement I made
about "conjugate priors" (which I corrected this morning).

David Winsemius indicated he made SOME efforts to READ what's
relevant. When he showed that he read the Edward, Lindman, and
Savage paper, I was TEMPTED to explain to him what the score
was, since he wasn't being confrontational even though his original
post right after Anon Bob (even after my explanation) seemed to
indicate that he never read a Freshman's BOOK about Bayesian
inference, and he STILL hasn't, or else he would have solved the
mystery himself. So, I'll reveal the Da Vinci Code to him and
all when I get to his post in the afternoon, following up on Herman
Rubin's comments on his questions which I didn't answer.

Then DZ emerged. I think that pretty much exhausted ALL the
educated Bayesians in sci.stat.math. from what I can gather in
my reading this group for 1 1/2 years.

DeGroot's comment on Shafer's paper "Lindley's paradox" criticized the
idea that "diffuse" should mean equal probability for all parameter
values and that in the normal(m,s) case, "diffuse" implies, more
appropriatly, that for example m^2 might be large - that is, the
variance is large.That is one of the meanings of the term "diffuse", and the normal
example (with a normal likelihood) is a GOOD example to say that
you CANNOT have a uniform distribution over the entire real line!
But it says more than that. It's related to Savage's "principle of
stable estimation" which gave a very quantifiable meaning to the
meaning of diffuse in the sense of "locally uniform" over a
likelihood function that is very sharp.

I had used the slightly altered and simplify meaning of "diffuse"
prior to mean one that would leave the posterior exactly the
same as a normalized likelihood function, so that the non-Bayesian
MLE becomes the maximum point of the posterior for a Bayesian,
if the likelihood function and the unnormailized posterior coincide.

Similarly, in the beta prior case, beta(1,1) is
uniform, but may not be diffuse enough, because as you let both a,b
starting from beta(a=1,b=1) go to zero, the variance increases.That is one way to look at it. But THIS was what I pointing at, for
the Freshman textbook nobody seemed to have found for the Da
Vinci Code of the conjugate prior beta for the binomial p.

The CONJUGATE part means both the prior and the posterior
are members of the beta family. If the prior distribution of the
binomial p is beta( alpha, beta ), and r and (n-r) are the
powers of p and (1-p) in the likelihood function, then the
posterior parameters will be changed to (alpha + r) and
(beta + n - r), in the beta family!

This needs one more step of explanation to show why Anon
Bob O'Hara was looking at BOTH the beta(1,1) prior and the
likelihood function and STILL missed it! That was the proof
that Bob O'Hara had never seen that Freshman book either
or any book, on how to make a Bayesian inference of the
parameter p of a Bernoulli process or a Binomial distribution.

I hope SOME ONE can manage to find a Bayesian book
(the more elementary the better) and show us what happens
when a uniform prior Beta(1,1) is applied to the Binomial
problem of p given r successes and f failures, r + f = n.

With fear of pouring gasoline on the fire, I'll mention that
_Statistical Inference and Prediction in Climatology: A
Bayesian Approach_ by Edward Epstein is as close as
I can come in my library.

What gasoline? What fire? ;-) The only thing HOT are those
the came out of mouths of our NOISIEST posters.

I don't know who Epstein is, but as I said, ANY elementary
book will do, and I think you delivered.

Chapter 3 treats Bernoulli
processes, and beta distributions as conjugate priors.
Interestingly (if I'm reading it correctly) he suggests using
r=n=0 as "vague" prior parameters. He acknowledges
that this gives a prior beta density that is undefined, but
writes, "Nevertheless, if we ignore this deficiency and
apply Eq, (3.5) using r'=n'=0 as prior parameters, then
the posterior parameters become r''=r and n''=n.

He even had the primes according to the usual cookbook
conventions. The r and n denote the SAMPLE r and n,
those in the likelihood function. r' and n' denote the
parameters in the prior distribution Beta(r',n') rather than
alpha and beta. That's because then you have the
"no brainer" of using the Beta as the conjugate prior,
because the posterior is given by (double primes)

r" = r + r' and n" = n + n'.

Right there is your Da Vinci Code for this simple result!

That's why in order to get the non-informative prior so
that the posterior is the same as the likelihood function,
r' and n' must both be zero. The improper Beta(0,0) as
Herman and David mentioned.

The
posterior density, unlike the prior, is proper (its integral
converges) if r!=0 and r!=n. In other words, if we feign
"total ignorance" and then obtain a set of data with at
least one success and one failure, then the resulting
posterior density is a mathematically proper form...".
However, he thinks that generally a more informative
prior is almost always available to the knowledgeable
analyst.

Of course. Even if one feels any p is as likely as another,
you have the UNIFORM, which is informative!

Your posterior will have parameters (r + 1) and (n + 1).

Epstein then goes on to work out some examples with
more informative priors, but none specifically with a
Beta(1,1) prior. But if any of the readers are interested,
that's a reference on the subject, FWIW.


Meanwhile, I'll take a short break before explaining it in my
reply to David Wisenmius's latest post of Sun, Oct 29 2006
1:53 pm, which contain both Herman Rubin's reply
yesterday, and a very relevant webpage provided by David.

Before I go there, as I had expected, any textbook would have
answered the question that the uniform is NOT noninformative.

The one little catch that tripped O'Hara, was that in order for
the POSTERIOR distribution to remain the original likelihood
function (for the NON-Bayesians), the original likelihood
function MUST be written as if it were a Beta so that when it
is combined with Beta(0,0), it'll still be a Beta which is the
"conjugate" part.

Now watch carefully. :-)

The NON-Bayesian likelihood for r successes out of n trials
is proportional to
p^r (1 - p)^(n-r)

Even O'Hara knew that, with a slight change of notation:

BO> L(p| r) = p^n (1-p)^(N-n)

But the L(p|r) is NOT in form of a Beta density! The kernel
of the Beta density has (alpha -1) and (beta - 1) in the
exponents!

That's why, in the form of a Beta density, the r and n of
the likelihood function must be parametrized as (r+1) and (n+1).
The posterior BETA, from the Beta Prior (alpha, beta) will
be Beta (r+1+ alpha, n+1+beta), which is why alpha and
beta need to be both ZERO for the posterior to be

Beta(r+1, n+1) which is the original likelihood function

p^r (1 - p)^(n-r)

Now you can go to my post requesting a Bayesian textbook
and pick out exactly where Bob O'Hara erred. Since he is
not a Bayesian nor Bayesian statistics trained, he had
trouble relating a likelihood function (which is NOT a density
in the PARAMETER) to a Bayesian distribution for prior and
posterior alike, that is a distribution of the PARAMETER of
interest, p in the Bernoulli case.

-- Reef Fish Bob.


Stay tuned.

-- Reef Fish Bob.

Cheers,
Russell

.



Relevant Pages


Loading