Re: Testing the Equality of Two Population Proportions
- From: "Reef Fish" <large_nassua_grouper@xxxxxxxxx>
- Date: 7 Nov 2006 08:25:04 -0800
Jerry.Dallal@xxxxxxxxx wrote:
Reef Fish wrote:
Jerry.Dallal@xxxxxxxxx wrote:
Snedecor & Cochran, 7-th, p 125
Freedman Pisani & Purves, section 27.2
Robbins & van Ryzin, p 192
Fisher & van Belle, p 187
Ott & Longnecker 5-th, p 484
Also, see Eberhardt & Fligner (1977), The American Statistician, 31,
151-155
I am sure Jerry Dallal's references have to do with what I called an
"error" in testing Ho: p1 - p2 = 0 with the form of the test
statistic
Z = (p1^ - p2^)/ sqrt( var )
WITHOUT using the pooled variance p^( 1-p^)(1/n1 + 1/n2)
which is found in EVERY elementary textbook I've used in 25 years
of teaching the FIRST course in statistics (at least half a dozen
different books of that level).
The REASON: The TEST STATISTIC for TESTING Ho must
incoporate (whenever appropriate) the fact that Ho is TRUE.
I know of no such principle. The only requirement is that the
distribution of the test statistic be known when the null hypothesis is
true.
I agree that YOU know of no such principle. That's you silly idea.
You know the distribution of YOUR test statistic with [ (p1^ - p2^) -
..75]
in the numerator, don't you?
Would you use that to test: Ho: p1 - p2 = 0 ?
Why or why not?
When Ho is TRUE, p1 = p2 , so that there is only ONE unknown
p. Therefore, why should one NOT use the pooled p^ for the
variance in the test statistic by the common p^ = (x1+x2)/(n1+n2)?
The REASON for using the pooled variance is NOT whether one
form of the approximation is better than the other (whatever that
means), but to satisfy the definition of "alpha" and "p-value"
associated with the test Ho: p1 = p2. For Ha: p1-p2 > 0,
alpha = Pr ( TEST STAT > c | Ho is true)
p-value = Pr (TEST STAT > observed Z* when Ho is true).
In BOTH cases, "when Ho is TRUE" is imbedded in the definition.
So, the question can be asked from the other direction, that
if we assume p1 = p2, then WHY do you use two DIFFERENT
estimate for p in the variance formula which as only ONE
unknown p?
Because for large samples, the distribution of the test statistic with
two different estimates of P is known when the null hypothesis is true
and it may be more powerful in some cases when the null is false.
The power is a separate consideration, unrelated to the definition
of "alpha" or "p-value".
THAT's the crux of the issue.
I noticed that Jerry did not give the publication dates of those
books, and only the date of the American Statistician article
WITHOUT the specific discussion of the point I made above
and elsewhere in my "Hypothesis Testing" topics about this
particular problem of testing the EQUALITY of two independent
p's with an approximate Z. (Large Sample case).
The two different z's, z1 and z2 Jerry used in his webpage
on the problem are both perfectly valid, for constructing
CONFIDENCE INTERVALS.
In C.I., the only relevant assumption is that the Statistic is
approximately normal. There is NOTHING in the construction
of confidence intervals for (p1-p2) that assumes p1 = p2! :-)
The textbooks I am referring to are NEARLY ALL (I am
learning to be careful on that :-)) and ALL of those ones
that were used in my university in ALL the courses I've
taught at the FIRST COURSE level, to all majors. Those
include English majors, and other liberal arts majors for
which the one course is their last statistics course.
The period covered 1975 - 1999. The reason I am
particularly familiar with this particular problem is that
the same test is used in several DIFFERENT first courses,
for math majors, for engineering majors, for nursing majors,
and for liberal arts majors. They ALL had this approach
in common, and they ALL make sense, because otherwise
it would be rather difficult to make sense about p-values
and alpha, while ignoring the clause "when Ho is TRUE".
The thing about elementary statistics texts is that they are
elementary, perhaps too much so sometimes. It's easier to give the
pooled version, which agrees with the chi-square statistic, than to go
into the issues we are discussing here.
The chi-square statistic has NOTHING to do with it. That's your
excuse.
More important, do you know of any text, elementary or otherwise, that
recommends against the separate p^ version? I know of one, but its
justification is a MISINTERPRETATION of the 1977 American Statistician
paper I cited.
Who cares about your 1977 American Statistician paper. I had given
you ALL the REASONS. You haven't given one single reason WHY
you don't use the pooled variance given all the reasons to use it.
Snedecor, Cochran, Robbins, and Freedman are GIANTS in the field.
While no one is perfect, I find it much more likely that the four of
them writing in three separate texts are more likely to be correct than
any one of us writing here. I strongly suggest that anyone who wants
to comment further begin by reading the references (especially the 1977
American Statistician paper). I have no interest in discussing the
point further with anyone who has not.
You are discussing and namedropping OUT OF CONTEXT. Quote
me one passage where they give the REASON for testing Ho: p1=-p2
and using the Z approximation, WITHOUT the pooled variance.
It is a very SPECIAL case. All you are throwing around are
completely non-specific. I had already said the z1 is acceptable
in ALL situations of (p1-p2) EXCEPT testing p1=p2 = 0.
You have given NO specific other than your own webpage,
which is glaringly WRONG in that particular use.
I recall Jerry Dallal and I had a rather extended discussion
of the definition and operational meaning of "p-value"
and Jerry kept citing the book by two statistician who
did not even include Ha as part of the definition of p-value,
which would make it impossible to tell what "more extreme"
means in;
p-value = Pr (TEST STAT is "more extreme" than the
observed value of the TEST STATISTICS
when Ho is TRUE).
It's all related and consistent. I think I am quoting Jerry
Dallal correctly when I said,
RF> p-value = Pr (observing something more extreme
RF> than the observed T* when Ho is true).
and Jerry had a one-word response:
RF> irrelevant
The "irrelevant" was in response to your email to me over the weekend.
If we are in agreement that the separate p^ statistic has a standard
normal distribution when H0 is true, then there is no need to focus on
P values or any other particular use of that distribution other than
that we can construct a test with specified alpha. Diverting the
discussion to P values is a distraction.
You're changing your tune now. Ho is true affects the DENOMINATOR
of the test statistic on how the variance should be estimate. Of
course you need separate p1 and p2 estimate in the numerator.
So, you have not given one single valid reason WHY you want to
estimate the variance of p-hat when there is only ONE p-hat,
(when Ho is true) by a variance estimate based on two separate
p-hat estimates.
-- Reef Fish Bob,
I think that pretty much summed up our disagreement. I
think the meaning and definition of p-value is RELEVANT,
to make the use of p-value consist and equivalent to the
use of a fixed alpha level test.
In BOTH cases, it IS relevant to know "when Ho is TRUE"
is part of the definition.
-- Reef Fish Bob.
.
- Follow-Ups:
- Re: Testing the Equality of Two Population Proportions
- From: Jerry . Dallal
- Re: Testing the Equality of Two Population Proportions
- References:
- Testing the Equality of Two Population Proportions
- From: Jerry . Dallal
- Re: Testing the Equality of Two Population Proportions
- From: Reef Fish
- Re: Testing the Equality of Two Population Proportions
- From: Jerry . Dallal
- Testing the Equality of Two Population Proportions
- Prev by Date: Re: hope this group could give some tools to support latex
- Next by Date: Re: What is the Logic Behind Hypothesis Testing?
- Previous by thread: Re: Testing the Equality of Two Population Proportions
- Next by thread: Re: Testing the Equality of Two Population Proportions
- Index(es):
Relevant Pages
|
Loading