Re: Sampling question
- From: David Winsemius <doe_snot@xxxxxxxxxxx>
- Date: Fri, 13 Jul 2007 09:15:12 -0500
"Nick" <tulse04-news1@xxxxxxxxxxx> wrote in
news:kIadnXNruaaO0ArbnZ2dnUVZ8qijnZ2d@xxxxxx:
"Bruce Weaver" <bweaver@xxxxxxxxxxxx> wrote in message
news:YOGdnR3HS9v8TgvbnZ2dnUVZ_u-unZ2d@xxxxxxxxxxxxxx
sci.stat.math wrote:
A simple question for experts but I'm a novice so excuse my
ignorance...Suppose I have a huge population of recordings of
certain values for each individual (eg: ratio of height/weight of
individuals, body fat % etc) and from that I would like to take a
sample that represents the population. My questions:
Should sampling be with or without replacement?
How do I determine what is a good size of the sample to have a good
enough estimate of the population? is there a specific formula or
test?
Thanks
When you sample from a "huge" population, there is very little
practical difference between sampling with and without replacement
(assuming the sample is small relative to the population). Let N be
the size of the huge population, n the size of the sample, and p the
probability of any element being drawn. When you sample randomly
*with* replacement, p = 1/N on each draw. If you sample randomly
*without* replacement, p = 1/N on the first draw, 1/(N-1) on the
second draw, 1/(N-2) on the 3rd draw, and so on until 1/(N-n+1) for
the last draw. Of course, those probabilities are for elements not
yet drawn; once an element has been drawn, its probability of being
drawn again becomes 0. The point is that if N is huge, 1/(N-n+1)
does not differ substantially from 1/N.
I see that you are a medical statistician or work at a medical school.
When dealing with live people there are clear reasons for not
re-interviewing the same people nor reexamining the same people.
Unless you were to put their data in twice.
You seem to be straying widely from the question originally posed. The OP
already has his data. There is no suggestion of re-testing or re-
interviewing. The question was whether to allow replacement after
sampling.
I have never heard of such a thing.
Which only shows you to be completely unaware of the theory supporting
bootstrap methods.
In nature one would have to either mark or remove a unit from the
population to be sure that one didn't sample twice.
Only when there is some sort of learning potential. But that is not an
issue when the data is already in hand. When you do so, you have created
a sampling frame that is unlike the best representation of the
population, namely the full dataset. You are creating a biased sample by
your _failure_ to replace. Do not confuse issues in repeated measurements
designs with the issue involved in resampling analysis methods.
--
David Winsemius
.
- Follow-Ups:
- Re: Sampling question
- From: Nick
- Re: Sampling question
- From: Nick
- Re: Sampling question
- References:
- Sampling question
- From: sci.stat.math
- Re: Sampling question
- From: Bruce Weaver
- Re: Sampling question
- From: Nick
- Sampling question
- Prev by Date: Re: Q: Bootstrap
- Next by Date: Re: Jack Tomsky doesn’t know a FUNDAMENTAL 50 year’s old theorem
- Previous by thread: Re: Sampling question
- Next by thread: Re: Sampling question
- Index(es):
Relevant Pages
|
Loading