Re: Estimation of variance of proportions



"Anon." <bob.ohara@xxxxxxxxxxxxxxxxx> wrote in

David Winsemius wrote:
<aubertot@xxxxxxxxxxxxxxx> wrote in
news:11931925.1151133920157.JavaMail.jakarta@xxxxxxxxxxxxxxxxxxxxxx:

Hi everyone,
I would like to estimate the variance of estimated proportions. I
think that it is generally correct to estimate the variance by
VAR^=np^(1-p^), where p^ is the estimated proportion and n the
sample size. However, when p^=0, this leads to an estimated variance
VAR^=0, which puzzles me because the variance does not depend
anymore on n. It is quite different to obtain p^=0 with n=2, or with
n=100000 for instance. Does anyone know how to estimate the variance
of a proportion when the estimated proportion is null ?


If you _know_ that p or (1-p) is zero, then the variance _is_ zero.
If you do not know that p=0, then why would you torture the formula
into giving you a meaningless estimate?

If you think that the proportion of successes, hits or events is not
necessarily zero, then you should not use zero, but rather a small
non-zero estimate. In that instance, the Poisson distribution and
associated methods might be mathematically convenient. The variance
of the Poisson equals the expected value which is np. The
reasonableness of the Poisson approximation (with large n) to the
binomial is easy to see, since np(1-p) will approach np because (1-p)
is near 1.

And then you plug in p=0... :-)

One can calculate the 95% confidence interval for such a proportion by
plotting the likelihood against p, and then finding the 95% quantile
(the lower limit will be zero: you can calculate a symmetric CI, but
it doesn't make sense as it doesn't include the point estimate). If
the proportion comes from a series of Bernoulli trials, then the
likelihood is a Beta distribution, so a good stats package will be
able to calculate it.


It did independently occur to me that we could offer the OP a discussion
of confidence limits around an observed value of zero. So I (re)post
with full knowledge that I may be straying over the Uebersax Line by
responding to what I thought the OP should have asked.

One fairly widely known quick approximation for the upper 95% confidence
bound on p from a sample that produced zero events out of n cases is 3/n.
Another (possibly more accurate, but certainly more flexible, since you
can vary alpha) approximation to the upper limit on p for zero
observations in n cases is 1-alpha^(1/n). These are not p-values but
rather binomial parameters.

Then there are the exact tests. In an R session you can get a variety of
such estimates.

The Hmisc library (many thanks to Frank Harrell) has binconf(obs,n) which
defaults to the Wilson method; the R stats package has binom.test(obs,n)
which is the Clopper and Pearson method; and x <- 1-alpha^(1/n) would
give you a quick obs=0 approximation.

Agresti has a website that includes some other alternatives written in R:
http://www.stat.ufl.edu/~aa/cda/R/one_sample/R1/

The SAS exact single sample binomial CI estimates (I have read) are based
on the Clopper/Pearson formula.

--
David Winsemius
.



Relevant Pages

  • Re: Estimation of variance of proportions
    ... I would like to estimate the variance of estimated proportions. ... proportion when the estimated proportion is null? ... If you think that the proportion of successes, hits or events is not necessarily zero, then you should not use zero, but rather a small non-zero estimate. ... the Poisson distribution and associated methods might be mathematically convenient. ...
    (sci.stat.math)
  • Re: Unusual formulae for confidence intervals
    ... Ray Koopman wrote: ... For a population proportion, use as the standard error 0.5 / ... estimate of the variance. ...
    (sci.stat.math)
  • Variance of an index of dispersion
    ... I wish to test a hypothesis that neighborhood organizations are more ... the squared proportion of each racial group and substracts that figure ... variance, ... and calculates the proportion of ties ...
    (sci.stat.consult)
  • Re: go directly to jail
    ... fourth or fifth power of the variance in axle loading, ... vehicles are responsible for a much larger proportion of the road ...
    (rec.arts.sf.fandom)
  • Re: Large Standard Errors in Multinomial Logit Estimation
    ... Herman Rubin wrote... ... "In that case, the coefficients should be 0, except for the ... but the estimated variance should also be zero. ...
    (sci.stat.math)