Re: binomial 'association' measure?

From: Dan Bolser (dmb_at_mrc-dunn.cam.ac.uk)
Date: 10/03/04


Date: Sun, 3 Oct 2004 15:56:28 +0100


Hi, thanks very much for the reply. Thanks especially for the paper
reference. The whole paper is very exciting for me! I find the formula for
the k-way interaction information very interesting!

One thing I should state to give you better understanding of my
problem(s), I am a bio-chemist by training and I dropped maths a long
time ago. Today I am studying bioinformatics, so I have a great need for
good statistical understanding. In this respect I have improved my maths
somewhat, but simple 'basics' can often be lacking in my maths toolbox,
because I never had a ground up education in more advanced maths.

I have asked some questions below and made some comments.

On Sat, 2 Oct 2004, it was written:

>Dan Bolser wrote:
>> Hi, I am trying to calculate the significance of association of two
>> events by counting their conincidence and correcting for their
>> occurance.
>>
>> I have two ways to look at the problem in binomial terms, and I want
>> to know if they are equivelent..
>
>What you have here are two models:
>
>* P(A,B) is multinomial, with four probabilities
>* P(A)P(B), a product of two independent binomial models

I think what you are saying is,

model 1) A and B 'interact',
model 2) A and B are independant ?

I don't understand why P(A)P(B) is "a product of two independent binomial
models".

I am using the binomial distribution to assess the probability of a
particular instance of AB (ab) given the (marginalized?) probability of an
instance of A (a) and the (marginalized?) probability of an instance of B
(b). Why are p(a) and p(b) "independent binomial models"?

Marginalization is probably one of those basics I should know... I think
the explaination in your paper is quite clear... Hopefully I use the
term correctly above or else confusion is mounting ;)

If I want the 'independant' probability of an instance of A (a), I sum
over the probabilites of all instances of ab for every instance of B (b)
(aB?).

Sorry...

I think I can see how p(a) could be the result of a binomial distribution,
but I have no way to independantly assess p(a), so I just find its
proportion out of A.

>You can approach significance testing in a variety of ways:
>
>1. "Classical":
> Fit three models, usually using maximum likelihood, P(A), P(B)
>(binomial), and P(A,B) (multinomial). Pick your test statistic which
>will measure the model error. Note that most statistics (such as X^2
>or G^2) can be interpreted as measuring the divergence between either
>two counts or between two probabilities, with some correction.

G^2 ?

Just to be explicitly clear about what I am doing, here is an example of
my calculations... (feel free to skip).

We have two bags (one left one right) and two colors of ball, (red and
blue).

A = {left,right},
B = {red,blue}

The bag on the left has 50 balls, the bag on the right has 100. 5 of the
balls is red. Four red balls are on the left (in our data); does this
indicate a significant interaction between the left bag and red balls?

n = 5
k = 4
p = 50/150

binom = choose(n,k) * p**k * (1-p)**(n-k)
      = 1 * 1/3**4 * 2/3**1
      = 1 * 0.01234567 * 2/3
      = 0.008

p = 0.008 means association is significant at 0.05 level

Ahh... Actually I need to ask "what is the probability of this
distribution or an even more extreem distribution".

I use the mean (np) to see which way extremity lies...

np = 5*50/150 = 3+1/3

k>np

therfore sum binom for k = k to n

in this case binom for k=4 + binom for k=5

= 0.008 + 0.004 = 0.01234567 (a curious number)

In the case where np and n(1-p) are > 10 I use the normal approximation
with correction for continuity.

gasp!

> - Assume that P(A)P(B) is null and P(A,B) is alternative, and use
>Fisher's exact test (or permutation testing)
> - Assume that P(A,B) is null and P(A)P(B) is alternative, and use
>Pearson's goodness-of-fit (or nonparametric bootstrap)
> - Perform cross-validation, and see how often one model is better
>than another on unseen data, P(A)P(B) or P(A,B)

Unseen data = data not used for parameterization? This will be usefull as
I don't know that I have all the data.

>We explored the latter two approaches in our paper at
>http://www.ailab.si/aleks/Int/jakulin-bratko-ICML2004.pdf.

OK. I find the idea of attribute clustering given this framework very
interesting. It is where my own thoughts were going, based on the
association measure and grouping of attributes.

I find the sentence in the discussion, "using P-values alone, we would
accept a model with rare but grave errors, but reject a model with
frequent but negligible ones" very curious, but perhaps I skipped over too
much of the paper to see that this has nothing todo with the current
discussion. I would love to understand this whole area better.

>2. Bayesian-style:
> - Compute three posterior hypotheses, P(xA|D), P(xB|D), P(xAB|D);
>here, xA,xB and xAB are the parameters. Examine the Bayes factor B(D)
>= P(D|xAB) / P(D|xA,xB), which can be converted into a "probability"
>through P(xA,xB|D) = 1/(1+B(D))
> - Compute P(xA|D) and P(xB|D), and choose your test statistic
>T(D,xA,xB) assessing the loss of P(xA)P(xB) on a sample D. The
>Bayesian p-value will be 1-Pr{ T(D') < T(D) | xA,xB }, where D' is a
>random sample of data from P(A,B|xA,xB), and D is the original sample.
>
>Of course, there are many variants. I have assumed that you have a
>specific sample of data, which isn't useful for basic Neyman-Pearson

yes

>hypothesis testing that only cares about sample size. Secondly, I have

I don't know what this is :(

>assumed you're looking for a probability-like quantity resulting from
>the test: this excludes model selection approaches such as BIC, AIC,
>MDL, DIC and so on.

Yes. MDL = minimum discription length?

I wish I had a bigger tool box!

I find all the above very interesting, but I like the simplicity of my
existing test. Do you think I need to change my approach?

I would like to try clustering my data by simply merging attributes and
seeing if I can increase the observed 'association' between groups of
attributes.

Actually I have already done this as my attributes are already organized
into a hierarchy I have a convenient choice of groupings.

I am having a hell of a time understanding the results though.

Thanks very much for your help,

All the best,
Dan.

>
>



Relevant Pages

  • Re: need help with one problem (sampling )
    ... But I have seen that in case of proportions sometimes they add and subtract 1/2n where n is the sample size. ... probability of ... you go halfway to the ... correction of 1/2n. ...
    (sci.stat.math)
  • Re: Advanced Probability.
    ... that is the probability of "less than or equal to". ... Correction: ... Similarly, in the VB function genprob, change the comparison to: ... first line and press Space and Delete.) ...
    (microsoft.public.excel.worksheet.functions)
  • Re: clarify or clogify? re: Why the human race is growing apart
    ... Assume that we begin with a range of integer scores, ... the size of the two samples increases the probability of the two samples ... samples are identical probably *decreases*, ... I am open to correction on this but but you have to make the ...
    (talk.origins)
  • Re: Basic question
    ... also - shouldn't the correction factor in the denominator be sqrt? ... Now suppose we want a prediction interval with probability p for ybar, ...
    (sci.stat.math)
  • Re: FRC BMD records
    ... whether there should be a correction to one of the Pearn entries ... And in all probability, so is Peam. ...
    (soc.genealogy.britain)