Re: binomial 'association' measure?
From: Dan Bolser (dmb_at_mrc-dunn.cam.ac.uk)
Date: 10/03/04
- Next message: Ian Jermyn: "Re: A simple but confusing question"
- Previous message: Peter Michaux: "Re: Sample Size for Emperical CDF"
- In reply to: Aleks Jakulin: "Re: binomial 'association' measure?"
- Next in thread: Aleks Jakulin: "Re: binomial 'association' measure?"
- Reply: Aleks Jakulin: "Re: binomial 'association' measure?"
- Messages sorted by: [ date ] [ thread ]
Date: Sun, 3 Oct 2004 15:56:28 +0100
Hi, thanks very much for the reply. Thanks especially for the paper
reference. The whole paper is very exciting for me! I find the formula for
the k-way interaction information very interesting!
One thing I should state to give you better understanding of my
problem(s), I am a bio-chemist by training and I dropped maths a long
time ago. Today I am studying bioinformatics, so I have a great need for
good statistical understanding. In this respect I have improved my maths
somewhat, but simple 'basics' can often be lacking in my maths toolbox,
because I never had a ground up education in more advanced maths.
I have asked some questions below and made some comments.
On Sat, 2 Oct 2004, it was written:
>Dan Bolser wrote:
>> Hi, I am trying to calculate the significance of association of two
>> events by counting their conincidence and correcting for their
>> occurance.
>>
>> I have two ways to look at the problem in binomial terms, and I want
>> to know if they are equivelent..
>
>What you have here are two models:
>
>* P(A,B) is multinomial, with four probabilities
>* P(A)P(B), a product of two independent binomial models
I think what you are saying is,
model 1) A and B 'interact',
model 2) A and B are independant ?
I don't understand why P(A)P(B) is "a product of two independent binomial
models".
I am using the binomial distribution to assess the probability of a
particular instance of AB (ab) given the (marginalized?) probability of an
instance of A (a) and the (marginalized?) probability of an instance of B
(b). Why are p(a) and p(b) "independent binomial models"?
Marginalization is probably one of those basics I should know... I think
the explaination in your paper is quite clear... Hopefully I use the
term correctly above or else confusion is mounting ;)
If I want the 'independant' probability of an instance of A (a), I sum
over the probabilites of all instances of ab for every instance of B (b)
(aB?).
Sorry...
I think I can see how p(a) could be the result of a binomial distribution,
but I have no way to independantly assess p(a), so I just find its
proportion out of A.
>You can approach significance testing in a variety of ways:
>
>1. "Classical":
> Fit three models, usually using maximum likelihood, P(A), P(B)
>(binomial), and P(A,B) (multinomial). Pick your test statistic which
>will measure the model error. Note that most statistics (such as X^2
>or G^2) can be interpreted as measuring the divergence between either
>two counts or between two probabilities, with some correction.
G^2 ?
Just to be explicitly clear about what I am doing, here is an example of
my calculations... (feel free to skip).
We have two bags (one left one right) and two colors of ball, (red and
blue).
A = {left,right},
B = {red,blue}
The bag on the left has 50 balls, the bag on the right has 100. 5 of the
balls is red. Four red balls are on the left (in our data); does this
indicate a significant interaction between the left bag and red balls?
n = 5
k = 4
p = 50/150
binom = choose(n,k) * p**k * (1-p)**(n-k)
= 1 * 1/3**4 * 2/3**1
= 1 * 0.01234567 * 2/3
= 0.008
p = 0.008 means association is significant at 0.05 level
Ahh... Actually I need to ask "what is the probability of this
distribution or an even more extreem distribution".
I use the mean (np) to see which way extremity lies...
np = 5*50/150 = 3+1/3
k>np
therfore sum binom for k = k to n
in this case binom for k=4 + binom for k=5
= 0.008 + 0.004 = 0.01234567 (a curious number)
In the case where np and n(1-p) are > 10 I use the normal approximation
with correction for continuity.
gasp!
> - Assume that P(A)P(B) is null and P(A,B) is alternative, and use
>Fisher's exact test (or permutation testing)
> - Assume that P(A,B) is null and P(A)P(B) is alternative, and use
>Pearson's goodness-of-fit (or nonparametric bootstrap)
> - Perform cross-validation, and see how often one model is better
>than another on unseen data, P(A)P(B) or P(A,B)
Unseen data = data not used for parameterization? This will be usefull as
I don't know that I have all the data.
>We explored the latter two approaches in our paper at
>http://www.ailab.si/aleks/Int/jakulin-bratko-ICML2004.pdf.
OK. I find the idea of attribute clustering given this framework very
interesting. It is where my own thoughts were going, based on the
association measure and grouping of attributes.
I find the sentence in the discussion, "using P-values alone, we would
accept a model with rare but grave errors, but reject a model with
frequent but negligible ones" very curious, but perhaps I skipped over too
much of the paper to see that this has nothing todo with the current
discussion. I would love to understand this whole area better.
>2. Bayesian-style:
> - Compute three posterior hypotheses, P(xA|D), P(xB|D), P(xAB|D);
>here, xA,xB and xAB are the parameters. Examine the Bayes factor B(D)
>= P(D|xAB) / P(D|xA,xB), which can be converted into a "probability"
>through P(xA,xB|D) = 1/(1+B(D))
> - Compute P(xA|D) and P(xB|D), and choose your test statistic
>T(D,xA,xB) assessing the loss of P(xA)P(xB) on a sample D. The
>Bayesian p-value will be 1-Pr{ T(D') < T(D) | xA,xB }, where D' is a
>random sample of data from P(A,B|xA,xB), and D is the original sample.
>
>Of course, there are many variants. I have assumed that you have a
>specific sample of data, which isn't useful for basic Neyman-Pearson
yes
>hypothesis testing that only cares about sample size. Secondly, I have
I don't know what this is :(
>assumed you're looking for a probability-like quantity resulting from
>the test: this excludes model selection approaches such as BIC, AIC,
>MDL, DIC and so on.
Yes. MDL = minimum discription length?
I wish I had a bigger tool box!
I find all the above very interesting, but I like the simplicity of my
existing test. Do you think I need to change my approach?
I would like to try clustering my data by simply merging attributes and
seeing if I can increase the observed 'association' between groups of
attributes.
Actually I have already done this as my attributes are already organized
into a hierarchy I have a convenient choice of groupings.
I am having a hell of a time understanding the results though.
Thanks very much for your help,
All the best,
Dan.
>
>
- Next message: Ian Jermyn: "Re: A simple but confusing question"
- Previous message: Peter Michaux: "Re: Sample Size for Emperical CDF"
- In reply to: Aleks Jakulin: "Re: binomial 'association' measure?"
- Next in thread: Aleks Jakulin: "Re: binomial 'association' measure?"
- Reply: Aleks Jakulin: "Re: binomial 'association' measure?"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|