Re: random sampling of very large populations

From: Herman Rubin (hrubin_at_odds.stat.purdue.edu)
Date: 10/14/04


Date: 13 Oct 2004 20:22:02 -0500

In article <416D6AE3.AC5D5116@hol.gr>,
George Kahrimanis <anakreon@hol.gr> wrote:
>gyro has asked:

>>I have a question regarding the proper approach to random sampling
>>within a very large population. My situation is as follows: I have a
>>population comprising three things: 'A', 'B', and 'C'. Suppose I have
>>1e9 'A's, 1e7 'B's, and 5e4 'C's. If I randomly sample 1e6 of this
>>population, how many 'A's, 'B's, and 'C's will I have?

>>For small populations, I have developed a MATLAB program that seems to
>>work. [...] For the case of interest, this program runs out of memory

>The normal ("Gaussian") approximation works for very large samples.

It is not that good. It can also depend on what you are looking for.

The error in the normal approximation corresponds to an
error in the random variable generated a little larger
than 1.

in the above example, one can see what this means. I have
not computed precisely, but the expected number of B's
found in the sample is approximately 1e4, and the expected
number of C' is approximately 50. The corresponding standard
deviations are about 100 and 7. So for the number of C's
at least we would want to do better.

It is not expensive to do good random number generation for
multinomial distributions, especially with a small number
of classes.

-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Department of Statistics, Purdue University
hrubin@stat.purdue.edu         Phone: (765)494-6054   FAX: (765)494-0558


Relevant Pages

  • Re: When might a Jew enter a church?
    ... wrote is only an approximation. ... In that I consider the explicit Torah prohibitions ... are those of the Statistics Department or of Purdue University. ... Herman Rubin, Department of Statistics, Purdue University ...
    (soc.culture.jewish.moderated)
  • Re: =?ISO-8859-1?B?QSBoYXJkIHNlcmllIDogc3VtX24gZXhwXigtYSpusik=?=
    ... from 0 to +infty. ... approximation of the gaussian integral which is easy to calculate. ... are those of the Statistics Department or of Purdue University. ... Herman Rubin, Department of Statistics, Purdue University ...
    (sci.math)
  • Re: statistics silly question??
    ... sum, so is the sum by the integral; ... use the same approximation to the factorial. ... are those of the Statistics Department or of Purdue University. ... Herman Rubin, Department of Statistics, Purdue University ...
    (sci.math)
  • Re: Throwing dice
    ... The distribution of the maximum, or its expectation, ... An approximation can be made if the sample size is ... are those of the Statistics Department or of Purdue University. ... Herman Rubin, Department of Statistics, Purdue University ...
    (sci.math)
  • Re: random sampling of very large populations
    ... gyro has asked: ... I have developed a MATLAB program that seems to ... The normal approximation works for very large samples. ... George K. ...
    (sci.stat.math)