Re: On Bayes



On Jun 4, 1:32 pm, Paulo Matos <pocma...@xxxxxxxxx> wrote:
Hi all,

I'm trying to work out a Bayesian probability which is getting me
confused.
Guess I have a documents of 1000 words and I'm considering 3 classes
of documents X, Y, Z (say prior probabilities are 10, 20 and 70%
respectively). I've estimated that:
Docs of type X have:
10 'bye'
15 'hello'
35 'english'
Docs of type Y have
15 'bye'
12 'hello'
40 'bye'
Docs of type Z have
30 'bye'
18 'hello'
35 'bye'

Can I compute from this the conditional probability P(doc having 27
'bye | doc of type X) ?

The problem is not well specified as it stands. You need some model
to specify the probability of getting b 'bye', h 'hello', and e
'english' in a document of type X (and similarly for Y and Z), which
we will denote P(b,h,e|X). A natural model to use is the multinomial
distribution:

P(b,h,e|X) = pb^b ph^h pe^e (1-pb-ph-pe)^(n-b-h-e) C(n,b,h,e),

where

* n = 1000,
* pb, ph, pe, and po are the probabilities of getting 'bye', 'hello',
and 'english', respectively, which we set to 10/1000, 15/1000, and
35/1000,
* and C(n,b,h,e) = n!/(b! h! e! (n-b-h-e)!) is the multinomial
coefficient.

You can sum over h and e to get the probability you specified:

P(b|X) = pb^b (1-pb)^(n-b) C(n,b),

which is simply a binomial distribution. P(27|X) = 3.6 x 10^-6, for
example, whereas P(10|X) = 0.126.

Typically, the problem of interest in this situation is classifying a
doc as type X, Y, or Z given the counts b, h, and e (or some subset of
them). Letting P0(X) be the prior probability of X, etc, we have

P(X|b,h,e) = P(X,b,h,e)/P(b,h,e) = P(b,h,e|X) P0(X) / (P(b,h,e|X)
P0(X) + P(b,h,e|Y) P0(Y) + P(b,h,e|Z) P0(Z)),

where (P0(X),P0(Y),P0(Z)) = (.1,.2,.7) in this case.

For example, if (b,h,e) = (20,15,35), then

P(b,h,e|X) = 1.3 x 10^-6,
P(b,h,e|Y) = 3.0 x 10^-5, and
P(b,h,e|Z) = 4.7 x 10^-5,

whence

P(X|b,h,e) = 0.016,
P(Y|b,h,e) = 0.386, and
P(Z|b,h,e) = 0.598.

(Note: because of the low frequencies of the words with respect to the
size of the text, you could also get by approximating the distribution
of each count as independent Poisson distributions, which are easier
to work with. E.g., P(b|X) = e^-10 10^b/b!, which yields P(27|X) =
4.2 x 10^-6.)

-Jim Ferry
Metron, Inc.
f rr @m tsc .c m
e y e i o

.



Relevant Pages

  • Re: On Bayes
    ... Can I compute from this the conditional probability P(doc having 27 ... to specify the probability of getting b 'bye', h 'hello', and e ... 'english' in a document of type X, ... distribution if the distribution is already given ...
    (sci.math)
  • Re: On Bayes
    ... Jim Ferry wrote: ... Can I compute from this the conditional probability P(doc having 27 ... 'english' in a document of type X, ... distribution if the distribution is already given ...
    (sci.math)
  • Re: Probability Problem 2
    ... Aaron Bergman wrote: ... What is the probability that the third point lies between the first two? ... The OP didn't specify that it was, for example, done uniformly. ... It's sufficient to know that there's some distribution defined over the line and then, as someone already noted, the answer is 1/3. ...
    (rec.puzzles)
  • Re: Probability Problem 2
    ... The OP didn't specify that it was, for example, ... Provided that the probability distribution in question has no ... think a collision counts or not). ...
    (rec.puzzles)
  • Re: Impossible? Or Just Very Unlikely?
    ... erroneous as any event that has a non-zero probability value can occur ... How am I supposed to specify an event without knowledge of the event, ... the outcome is just such an abuse of probability theory. ...
    (talk.origins)