Re: Hypothesis testing with proportions



> So in this case, it's probably not worth splitting hairs about how many angels > can dance on the head of a pin.

Very very true!

But it's an interesting problem in its own right, so let's try to tease
it apart (CAUTION: rant with uncessary details follows).

The main question is: Is the proportion of mutant DNA equal to .5?
Denote the true proportion of mutant DNA as phi.

Now, one could take a sample of size n and calculate the proportion of
mutant DNA in that sample (call that p). Under H0: phi = .5, we know
that

p ~ N(.5, .5(1-.5)/n)

based on the properties of the binomial distribution. The typical
method to test H0 is to calculate

z = (p - .5)/(sqrt(.5(1-.5)/n),

which, under H0, follows approximately a standard normal distribution.

Now, it seems we have a problem here, because we don't know what n in a
single sample is! It's one of those situations where all you can say
is: "it's 30% mutant and 70% other" and that's it -- just like you can
fill a glass with 30% milk and 70% water and wouldn't know n in that
case either.

Okay, so we can take another approach. Take i = 1, ..., 45 separate
proportions. Denote them as p_1, ..., p_45. Under H0,

p_i ~ N(.5, .5(1-.5)/n_i).

Now, I think it is perfectly reasonable to calculate

(bar(p) - .5) / sd(p)/sqrt(45),

where bar(p) is the mean of the proportions and sd(p) the standard
deviation of the 45 proportions, and just compare that against a
standard normal distribution. For conservativeness, we could use a
t-distribution, but with df = 44, it makes little difference.

Now, I see two issues:

1) The 45 samples are taken from the same individual. So, can we simply
treat p_1, ..., p_45 as independent? Maybe tissue samples taken from
regions closer to each other are more alike than samples from regions
further apart. Then we get dependencies between the proportions. This
is more of a question for a biologist to answer.

2) Can we assume that phi is equal to .5 for all 45 samples? If phi can
be assumed not to vary between the samples (again, a question for a
biologist), then all is fine. However, take the case where phi varies.
Then we really have phi_1, ..., phi_45 and each p_i actually comes from
a different distribution. To go further, we may need some additional
distributional assumptions. Assume that the phi values are actually
sampled from a normal distribution with expected value mu_phi and
variance var_phi. Therefore:

phi_i ~ N(mu_phi, var_phi)
pi ~ N(phi_i, phi_i(1-phi_i)/n_i)

Now what we would test is whether the expected value of the phi values
(mu.phi) is equal to .5 or not. Assume that the n_i's are constant
across the samples (all equal to n). Plus, we apply the arcsine
transformation. Then we have:

p*_i = arcsine(sqrt(p_i)) ~ N(arcsine(sqrt(phi_i)), 1/(4n)).

The optimally weighted average of the p*_i values uses w_i = 1/(var_phi
+ 1/4n) as its weights. We still don't know n (or var_phi), but that's
not problem, because the weights are all equal to each other, so the
weighted average of the p*_i values is equal to the simple average.
Then we are back to

(bar(p*) - .5) / sd(p*)/sqrt(45),

this time using the transformed values, which now tests H0: mu_phi = 0.
Without the arcsine transformation, you get heterogeneous sampling
variances and this approach isn't quite applicable anymore. Of course,
the assumption of equal n's also needs to hold.

This problem actually calls on methods frequently used in
meta-analysis. Whether the phi values are homogeneous or heterogeneous
gets into the same issue as whether to apply a fixed- or a
random-effects model in meta-analysis.

m00es

.



Relevant Pages

  • Re: Hypothesis testing with proportions
    ... We made 45 gels from tissue samples of one patient which contain partially mutant ... the gels we calculated for each of the 45 gels the proportion of mutant DNA ... > requires the variable obey normal distribution. ... > proportion passes normality test, you can use t-test; ...
    (sci.stat.edu)
  • Re: Power computation in Z test
    ... normal distribution with mean mu and std deviation sigma=5. ... take in order to have at 83.7% power when the true value of the mean ... with Phi the cdf of the standard normal ...
    (sci.stat.math)