Re: Estimating the mean of the cumulative hypergeometric?



On Jan 12, 9:21 pm, petertwocakes <petertwoca...@xxxxxxxxxxxxxx>
wrote:
Hi,

I'm trying to save some computation when calculating the cumulative
probability of the hypergeometric distribution by first estimating the
mean and working forwards/backwards from there.

h(N, m, n, k) = [ kCm ] [ N-kCn-m ] / [ NCn ]

Did you mean [ mCk ] [ N-mCn-k ] / [ NCn ]? (Or maybe you're just
designating the variables differently from what I was expecting...)


mean = n * (m/N)

So for instance, if mean = 1000 and I wanted to know the cumulative P
for k <= 1100, I'd start with a cumulative value of 0.5 for k=1000
then just add the values from k = 1001 to 1100.
(In tandem with skipping the many values close to zero, I'm typically
only having to compute about 25% of the values I   would have to.)

I understand why for small values of m the distribution is too coarse
or skewed for this to be accurate, but even in the example above with
large m, I'm getting a significant error:  i.e when I brute-force add
up all the values to the estimated mean I'm getting  0.509  instead of
0.5.  Therefore, if I used this shortcut , I'd be out by .009. for any
value of k.

Is assuming  mean = n * (m/N)   always a no-no, or are there times
when it's a reliably accurate estimate?

I'm sure you already know this, but just in case: you realise that you
don't need to calculate the binomial coefficients afresh for each
value of k, right? You can just multiply what you had for k - 1 by the
appropriate ratio, thus saving a huge amount of computation.
.



Relevant Pages


Loading