Choosing a Sample Size Formula



Hello,

It has been years since I cracked open a statistic book and I need help
finding a formula to determine sample size.

I need to be able to choose a sample of datasets and from those
determine within a degree of confidence if there are data integrity
issues with an archival system. The data integrity is determined by
comparing an original checksum with the calculated checksum of the
sample file. If they do not match then that file is assumed to be
corrupt.

I know the number files there are in the system there are three areas
that I will need to sample one area has over 17000000 files another has
9057 and another has 4005996. I will be able to find the exact number
from the systems database. I found the following formula but it seems
you need to know a lot about the data to determine the sample size.

n=(z*s/E)^2.
n=sample size
z=the z score associated with the degree of confidence selected.
s=the sample standard deviation of the pilot survey.
E=the allowable error.

I'm not sure how to determine the values for z s or E. I'm not even
sure if this is the correct formula to use. I was thinking about using
the degree of confidence of .95 or .99 as well as maybe the allowable
error of +/-5 files, but these are just values I pulled from the air.
Help with this would be greatly appreciated.

Wen

.


Quantcast