Re: binomial 'association' measure?

From: Dan Bolser (dmb_at_mrc-dunn.cam.ac.uk)
Date: 10/10/04


Date: Sun, 10 Oct 2004 12:55:28 +0100
To: Aleks Jakulin <a_jakulin@hotmail.com>

On Sat, 9 Oct 2004, it was written:

>D. Bolser wrote:
>> Thanks for the help, but I am still lost.
>>
>> I want to measure the bias in a certain distribution of events. I
>> want to know when the bias is more than you expect by
>> chance, and I would like to quantify that bias.
>
>Presuming that we're still talking about the same data...

Yes

>
>Dan has three discrete variables, one is binary ("group A/B"), and the
>other two are counts, one being "successes" and the other "total
>attempts". This kind of count data isn't appropriate for Fisher's
>exact test, or for goodness-of-fit. Count data is usually not modelled
>with binomial or multinomial distributions, but with e.g., Poisson.

But I thought that this data fits a 'standard' (I don't know what that is
any more) 2 way contingency table?

Just like Eczema:{Yes,No}, HayFever:{Yes,No} and the counts for each
instance (a particular combination of attributes).

>Dan has been correctly thinking about examining the ratio of successes
>to total attempts across the groups as a simple approach to testing,
>forcing a kind of a binomial model. This is a reasonable
>approximation,

It is? If that is the case I am overjoyed! I can easily calculate the
binomial PDF and the normal approximation thereof (where appropriate) to
get a nice, clean (because I understand where it came from) p-value.

>and I recommend the test for equal proportions, prop.test in R.

What is this now? I mean I was using the test, but what is it based on?

>Aleks

The way I understood my problem was this... Each attribute of an instance
of my data has a binomial distribution, as I can boil each attribute down
to a yes no question.

So for each 'pick' for example;

Picker:{Male,Female},
colorOfBall:{Blue,Other}.

Actually I have lots of attribute values, but I am only interested in
looking at pairwise associations, so making an 'Other' value for 'not
blue' seems OK.

so, when a M picking a B is a 'success' (141 times)...

. B O t
M 141 420 561
F 928 13525 14453
t 1069 13945 15014

The binomial distribution of Picker, N=1069, p= 561/15014, K=141
The binomial distribution of Color, N= 561, p=1069/15014, K=141

So we can ask how extreme the number of successful Pickers is given the
color blue, N=1069, p=561/15014, K=141. As Np ~ 40, we sum the binomial
probability from K=K to K=1069 (more extreme).

And we can ask how extreme the the number of successful Colors is given
the picker Male, N=561, p=1069/15014, K=141. Now Np is the same..., so we
sum the binomial from K=K to K=561 (if Np were>K we would go from K=K to
K=0, i.e. more extreme).

Does this look sane?

However, in either case we assume the other case to be fixed, and since
both cases have a distribution of their own, we need to ask what the
probability of having a significantly different distribution is, hence
prop.test.

How do I explicitly calculate the above? Is that fishers test?

How can I visualize the joint PDF? Using a 3D contour plot I guess, but
how do I calculate the probability at each point in the 2D count space? Is
it a multinomial over the combinatorial set of possible outcomes? Does
this make sense, or have I gone totally off track?

Does fishers test amount to finding the probability of having a more
extreme distribution in this 2d landscape of probability?

And there is no simple way to combine the binomial distributions right?

It is these last questions that give me trouble - I am not sure what to do
or what is happening....

Does my explanation above make sense?

Cheers,
Dan.

>
>



Relevant Pages

  • Re: Random numbers with a bias
    ... >for a stock simulation I need random numbers with a defined bias, ... >Let's say my stock trading system has a win probability of 60%. ... You're misusing the term 'bias'. ... possibility would be a Lognormal distribution shifted so it's 40th percentile ...
    (microsoft.public.excel.worksheet.functions)
  • Re: Pigeons, People, and Priors
    ... the variance of the probability generator go to zero you have a continuum ... a random-interval 60 s schedule is not. ... The Exponential Distribution ... I probably should have used the phrase "statistical learning theory" rather ...
    (comp.ai.philosophy)
  • Re: So called "stimulus/response" models
    ... Instead of answering to each misunderstood, ironic and out of context ... Sorry, you exhibit a simplistic view of probability theory, and an even more ... of acquiring the consequences of responses. ... distribution over consequences of a given act. ...
    (comp.ai.philosophy)
  • Re: Bill Reid, Kelly Criterion
    ... about logs; if a person is talking about a percentage change in the ... probability of going broke the more they trade. ... adjustment (which is the one which allows any distribution which is ...
    (misc.invest.stocks)
  • Re: Hardy-weinberg Equilibrium
    ... Mating is random. ... while panmixis means equal probability of any ... But suppose we assumed a normal distribution? ... Are you claiming that statistical randomness requires a uniform ...
    (talk.origins)