Re: Estimating the mean of the cumulative hypergeometric?
- From: Matt <matt271829-news@xxxxxxxxxxx>
- Date: Wed, 14 Jan 2009 19:47:37 -0800 (PST)
On Jan 14, 6:00 pm, petertwocakes <petertwoca...@xxxxxxxxxxxxxx>
wrote:
On 13 Jan, 20:19, Matt <matt271829-n...@xxxxxxxxxxx> wrote:
On Jan 13, 3:43 pm, petertwocakes <petertwoca...@xxxxxxxxxxxxxx>
wrote:
This is my routine for calculating the cumulative _Binomial_
distribution, with relacement, given n and p:
double q = 1-p, c = pow(q,n); // i.e. when k=0
double cp = c; // cumulative probability
for(double k=1; k<=n; k++)
{
c = ((n-(k-1))/k)*c*(p/q);
cp += c;
}
It's fast, and (for me) clear to understand, and as it's iterative,
the cost of calculating the cumulative probability is little more than
for calculating a specific probability.
So... I REALLY thought I would be able to modify this to work with no-
replacement, after all the only difference is that p&q are changing as
a result of N and m being reduced.
(m = number successes in N; k = number successes in sample(n))
Surely, in the loop above, if I could recalculate p&q by how N and m
are changing, then this would give the same result as the
hypergeometric??
I don't see that this is an efficient way to approach it. p and q are
changing, sure, but they're changing in a way that depends on the
history of the previous selections, and you don't want to be tracking
all those possibilities individually.
Your loop code for the hypergeometric should not be that different
from the binomial. You keep a running count of the product of the
binomial coefficients involving k: this is what I was alluding to
earlier. Each iteration involves multiplying by two ratios, something
like c = c*(m - k)/(k + 1)*(n - k)/(N - m - n + k + 1), depending on
exactly how you implement it. Of course, you can pre-calculate N - m -
n + 1. You do also have the overhead of initialising this multiplier
to (N - m)C(n) and calculating the denominator Ncn.
Thanks Matt, I've got your's and Ray's iterative method working now.
It's so fast compared to the brute force method of summing across all
ks that the overhead you mention isn't a problem. The only catch is
that you have to ensure that none of the early values of k from k0
onward are zero, as naturally the rest of the series will be locked at
zero when that happens. I solved this by doing a binary search for the
first non-zero value of k.
Not that it matters if you've got it working to your satisfaction, but
I don't follow what you mean here. I assume you don't actually mean
zero values "of" k (since there's only one). If you mean out-of-range
values of k then there's no need to do a binary search; you can
calculate the valid range easily using max and min. If you mean
vanishingly small probabilities for certain values of k then I don't
see how a binary search can profitably be used when you are
calculating the probabilities recursively. Or maybe in the binary
search phase you aren't calculating them recursively? Is that really
worth doing? Doesn't seem so to me... but I could be be missing
something.
.
- Follow-Ups:
- Re: Estimating the mean of the cumulative hypergeometric?
- From: petertwocakes
- Re: Estimating the mean of the cumulative hypergeometric?
- References:
- Estimating the mean of the cumulative hypergeometric?
- From: petertwocakes
- Re: Estimating the mean of the cumulative hypergeometric?
- From: Matt
- Re: Estimating the mean of the cumulative hypergeometric?
- From: petertwocakes
- Re: Estimating the mean of the cumulative hypergeometric?
- From: petertwocakes
- Re: Estimating the mean of the cumulative hypergeometric?
- From: Matt
- Re: Estimating the mean of the cumulative hypergeometric?
- From: petertwocakes
- Estimating the mean of the cumulative hypergeometric?
- Prev by Date: Re: what would euler do with
- Next by Date: -- f(k) = k^(n-1) (mod n)
- Previous by thread: Re: Estimating the mean of the cumulative hypergeometric?
- Next by thread: Re: Estimating the mean of the cumulative hypergeometric?
- Index(es):
Relevant Pages
|