Re: random sampling of very large populations
From: Herman Rubin (hrubin_at_odds.stat.purdue.edu)
Date: 10/14/04
- Next message: David Alexander: "simple time series question"
- Previous message: Duncan Murdoch: "Re: random sampling of very large populations"
- In reply to: George Kahrimanis: "Re: random sampling of very large populations"
- Next in thread: George Kahrimanis: "Re: random sampling of very large populations"
- Reply: George Kahrimanis: "Re: random sampling of very large populations"
- Reply: Gyro Funch: "Re: random sampling of very large populations"
- Messages sorted by: [ date ] [ thread ]
Date: 13 Oct 2004 20:22:02 -0500
In article <416D6AE3.AC5D5116@hol.gr>,
George Kahrimanis <anakreon@hol.gr> wrote:
>gyro has asked:
>>I have a question regarding the proper approach to random sampling
>>within a very large population. My situation is as follows: I have a
>>population comprising three things: 'A', 'B', and 'C'. Suppose I have
>>1e9 'A's, 1e7 'B's, and 5e4 'C's. If I randomly sample 1e6 of this
>>population, how many 'A's, 'B's, and 'C's will I have?
>>For small populations, I have developed a MATLAB program that seems to
>>work. [...] For the case of interest, this program runs out of memory
>The normal ("Gaussian") approximation works for very large samples.
It is not that good. It can also depend on what you are looking for.
The error in the normal approximation corresponds to an
error in the random variable generated a little larger
than 1.
in the above example, one can see what this means. I have
not computed precisely, but the expected number of B's
found in the sample is approximately 1e4, and the expected
number of C' is approximately 50. The corresponding standard
deviations are about 100 and 7. So for the number of C's
at least we would want to do better.
It is not expensive to do good random number generation for
multinomial distributions, especially with a small number
of classes.
-- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Department of Statistics, Purdue University hrubin@stat.purdue.edu Phone: (765)494-6054 FAX: (765)494-0558
- Next message: David Alexander: "simple time series question"
- Previous message: Duncan Murdoch: "Re: random sampling of very large populations"
- In reply to: George Kahrimanis: "Re: random sampling of very large populations"
- Next in thread: George Kahrimanis: "Re: random sampling of very large populations"
- Reply: George Kahrimanis: "Re: random sampling of very large populations"
- Reply: Gyro Funch: "Re: random sampling of very large populations"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|