Re: On Bayes
- From: kilian heckrodt <kilianheckrodt@xxxxxxxxx>
- Date: Tue, 05 Jun 2007 23:09:42 +0200
Jim Ferry wrote:
On Jun 4, 1:32 pm, Paulo Matos <pocma...@xxxxxxxxx> wrote:Hi all,
I'm trying to work out a Bayesian probability which is getting me
confused.
Guess I have a documents of 1000 words and I'm considering 3 classes
of documents X, Y, Z (say prior probabilities are 10, 20 and 70%
respectively). I've estimated that:
Docs of type X have:
10 'bye'
15 'hello'
35 'english'
Docs of type Y have
15 'bye'
12 'hello'
40 'bye'
Docs of type Z have
30 'bye'
18 'hello'
35 'bye'
Can I compute from this the conditional probability P(doc having 27
'bye | doc of type X) ?
The problem is not well specified as it stands. You need some model
to specify the probability of getting b 'bye', h 'hello', and e
'english' in a document of type X (and similarly for Y and Z),
If I read his notation correctly those are given:
For instance
>> Docs of type X have:
>> 10 'bye'
>> 15 'hello'
>> 35 'english'
means P(b|X)=0.1, P(h|X)=0.15, P(e|X)=0.35
I assume he has skipped the % sign.
which
we will denote P(b,h,e|X). A natural model to use is the multinomial
distribution:
P(b,h,e|X) = pb^b ph^h pe^e (1-pb-ph-pe)^(n-b-h-e) C(n,b,h,e),
where
* n = 1000,
* pb, ph, pe, and po are the probabilities of getting 'bye', 'hello',
and 'english', respectively, which we set to 10/1000, 15/1000, and
35/1000,
* and C(n,b,h,e) = n!/(b! h! e! (n-b-h-e)!) is the multinomial
coefficient.
You can sum over h and e to get the probability you specified:
P(b|X) = pb^b (1-pb)^(n-b) C(n,b),
I don't quite follow you here, why do you set up a binomial/multinomial
distribution if the distribution is already given (and there are no repeated trials with it)
P(b|X)=pb is given in the problem description
which is simply a binomial distribution. P(27|X) = 3.6 x 10^-6, for.
example, whereas P(10|X) = 0.126.
Typically, the problem of interest in this situation is classifying a
doc as type X, Y, or Z given the counts b, h, and e (or some subset of
them). Letting P0(X) be the prior probability of X, etc, we have
P(X|b,h,e) = P(X,b,h,e)/P(b,h,e) = P(b,h,e|X) P0(X) / (P(b,h,e|X)
P0(X) + P(b,h,e|Y) P0(Y) + P(b,h,e|Z) P0(Z)),
where (P0(X),P0(Y),P0(Z)) = (.1,.2,.7) in this case.
For example, if (b,h,e) = (20,15,35), then
P(b,h,e|X) = 1.3 x 10^-6,
P(b,h,e|Y) = 3.0 x 10^-5, and
P(b,h,e|Z) = 4.7 x 10^-5,
whence
P(X|b,h,e) = 0.016,
P(Y|b,h,e) = 0.386, and
P(Z|b,h,e) = 0.598.
(Note: because of the low frequencies of the words with respect to the
size of the text, you could also get by approximating the distribution
of each count as independent Poisson distributions, which are easier
to work with. E.g., P(b|X) = e^-10 10^b/b!, which yields P(27|X) =
4.2 x 10^-6.)
-Jim Ferry
Metron, Inc.
f rr @m tsc .c m
e y e i o
- Follow-Ups:
- Re: On Bayes
- From: kilian heckrodt
- Re: On Bayes
- References:
- On Bayes
- From: Paulo Matos
- Re: On Bayes
- From: Jim Ferry
- On Bayes
- Prev by Date: Re: Paths
- Next by Date: Re: Covering map of degree 3 and multiplicy
- Previous by thread: Re: On Bayes
- Next by thread: Re: On Bayes
- Index(es):
Relevant Pages
|