Re: Prevalence estimates from population characteristics - HELP!

From: George Kahrimanis (anakreon_at_hol.gr)
Date: 11/18/04


Date: Thu, 18 Nov 2004 20:22:37 +0200
To: Karin Jensen <k_jensen@mailblocks.com>

Karin Jensen wrote, on 2004-11-07 11:11:03 PST,

>The authoritative document [...] gives [...]
> 95% CIs for prevalence rates for different demographic
>characteristics (e.g. 1.9% +/- 0.3% in whites, say, 1.5%
> +/- 0.5% in people of Chinese origin, etc.).

>I am trying to estimate the prevalence of a disease in different
>areas, for health services provision modelling.

>>From the Census and other data sources, I know the number of people
>in each area in each category, such as the number in each age group,
>ethnic group or social class.

I would imagine that textbooks dealing with logistic regression
would treat this problem, too. I have never needed to study these
techniques, so that I do not know what *the* professionally accepted
solution would be. I hope that someone else will provide you with
a lead on that; then you would compare that solution with the
suggestions in the rest of this message.

Method #1

If your boss and supervisors are skeptical toward Bayesian assumptions,
so that it would be too hard to argue with them,
an easy way to do the job is to calculate provisions for each subgroup
and then simply order the sum these numbers.

You may be criticised because, in comparison with Method #2, you would
be ordering a larger number of provisions, due to overestimating
the uncertainty in the total number of sick people. You may reply that
if diseases have external causes, there will be strong inter-group
correlations, so that Method #1 would be appropriate -- at least until
a full treatment be available, in which every possible external cause
would be taken explicitly into account.

Method #2

For the moment, let us forget the suspected correlations. (You can
introduce them later on your own.) We presume that the measured
samples, as well as the targeted populations, were/are large enough
for normal approximations to apply. It will be quite reasonable to
use Bayesian assumptions, at least as an approximation. Then the
"+/-" part in the 95% CIs is +/- 1.96 the sigma of each posterior.
Now we are just adding independent normal P.V.s: for the mean, we
average the particular means; the SD of the combined population
is the average of the particular SDs; we obtain the combined sigma
by extracting the square root.

If you must, you can present your result as a 95% CI, using
+/- 1.96 sigma, and let sleeping frequentists lie.

Caveat. I have inserted secret bugs in these methods to
prevent their use in biological warfare :-P

Missing point. I have not explained what do I regard as the
exact treatment to which Bayesian assumptions are a good
approximation. That is not the issue here.

~ George Kahrimanis


Quantcast