Re: Testing the Equality of Two Population Proportions




Reef Fish wrote:
Jerry.Dallal@xxxxxxxxx wrote:
Snedecor & Cochran, 7-th, p 125
Freedman Pisani & Purves, section 27.2
Robbins & van Ryzin, p 192
Fisher & van Belle, p 187
Ott & Longnecker 5-th, p 484

Also, see Eberhardt & Fligner (1977), The American Statistician, 31,
151-155

I am sure Jerry Dallal's references have to do with what I called an
"error" in testing Ho: p1 - p2 = 0 with the form of the test
statistic
Z = (p1^ - p2^)/ sqrt( var )

WITHOUT using the pooled variance p^( 1-p^)(1/n1 + 1/n2)

which is found in EVERY elementary textbook I've used in 25 years
of teaching the FIRST course in statistics (at least half a dozen
different books of that level).

The REASON: The TEST STATISTIC for TESTING Ho must
incoporate (whenever appropriate) the fact that Ho is TRUE.

I know of no such principle. The only requirement is that the
distribution of the test statistic be known when the null hypothesis is
true.

When Ho is TRUE, p1 = p2 , so that there is only ONE unknown
p. Therefore, why should one NOT use the pooled p^ for the
variance in the test statistic by the common p^ = (x1+x2)/(n1+n2)?

The REASON for using the pooled variance is NOT whether one
form of the approximation is better than the other (whatever that
means), but to satisfy the definition of "alpha" and "p-value"
associated with the test Ho: p1 = p2. For Ha: p1-p2 > 0,

alpha = Pr ( TEST STAT > c | Ho is true)

p-value = Pr (TEST STAT > observed Z* when Ho is true).

In BOTH cases, "when Ho is TRUE" is imbedded in the definition.

So, the question can be asked from the other direction, that
if we assume p1 = p2, then WHY do you use two DIFFERENT
estimate for p in the variance formula which as only ONE
unknown p?

Because for large samples, the distribution of the test statistic with
two different estimates of P is known when the null hypothesis is true
and it may be more powerful in some cases when the null is false.

THAT's the crux of the issue.

I noticed that Jerry did not give the publication dates of those
books, and only the date of the American Statistician article
WITHOUT the specific discussion of the point I made above
and elsewhere in my "Hypothesis Testing" topics about this
particular problem of testing the EQUALITY of two independent
p's with an approximate Z. (Large Sample case).

The two different z's, z1 and z2 Jerry used in his webpage
on the problem are both perfectly valid, for constructing
CONFIDENCE INTERVALS.

In C.I., the only relevant assumption is that the Statistic is
approximately normal. There is NOTHING in the construction
of confidence intervals for (p1-p2) that assumes p1 = p2! :-)

The textbooks I am referring to are NEARLY ALL (I am
learning to be careful on that :-)) and ALL of those ones
that were used in my university in ALL the courses I've
taught at the FIRST COURSE level, to all majors. Those
include English majors, and other liberal arts majors for
which the one course is their last statistics course.

The period covered 1975 - 1999. The reason I am
particularly familiar with this particular problem is that
the same test is used in several DIFFERENT first courses,
for math majors, for engineering majors, for nursing majors,
and for liberal arts majors. They ALL had this approach
in common, and they ALL make sense, because otherwise
it would be rather difficult to make sense about p-values
and alpha, while ignoring the clause "when Ho is TRUE".


The thing about elementary statistics texts is that they are
elementary, perhaps too much so sometimes. It's easier to give the
pooled version, which agrees with the chi-square statistic, than to go
into the issues we are discussing here.

More important, do you know of any text, elementary or otherwise, that
recommends against the separate p^ version? I know of one, but its
justification is a MISINTERPRETATION of the 1977 American Statistician
paper I cited.

Snedecor, Cochran, Robbins, and Freedman are GIANTS in the field.
While no one is perfect, I find it much more likely that the four of
them writing in three separate texts are more likely to be correct than
any one of us writing here. I strongly suggest that anyone who wants
to comment further begin by reading the references (especially the 1977
American Statistician paper). I have no interest in discussing the
point further with anyone who has not.

I recall Jerry Dallal and I had a rather extended discussion
of the definition and operational meaning of "p-value"
and Jerry kept citing the book by two statistician who
did not even include Ha as part of the definition of p-value,
which would make it impossible to tell what "more extreme"
means in;

p-value = Pr (TEST STAT is "more extreme" than the
observed value of the TEST STATISTICS
when Ho is TRUE).

It's all related and consistent. I think I am quoting Jerry
Dallal correctly when I said,

RF> p-value = Pr (observing something more extreme
RF> than the observed T* when Ho is true).

and Jerry had a one-word response:

RF> irrelevant

The "irrelevant" was in response to your email to me over the weekend.
If we are in agreement that the separate p^ statistic has a standard
normal distribution when H0 is true, then there is no need to focus on
P values or any other particular use of that distribution other than
that we can construct a test with specified alpha. Diverting the
discussion to P values is a distraction.

I think that pretty much summed up our disagreement. I
think the meaning and definition of p-value is RELEVANT,
to make the use of p-value consist and equivalent to the
use of a fixed alpha level test.

In BOTH cases, it IS relevant to know "when Ho is TRUE"
is part of the definition.

-- Reef Fish Bob.

.



Relevant Pages

  • (OoopsONE erratum) Re: Testing the Equality of Two Population Proportions
    ... Also, see Eberhardt & Fligner, The American Statistician, 31, ... I noticed that Jerry did not give the publication dates of those ... taught at the FIRST COURSE level, to all majors. ... did not even include Ha as part of the definition of p-value, ...
    (sci.stat.math)
  • Re: Testing the Equality of Two Population Proportions
    ... Also, see Eberhardt & Fligner, The American Statistician, 31, ... I noticed that Jerry did not give the publication dates of those ... taught at the FIRST COURSE level, to all majors. ... did not even include Ha as part of the definition of p-value, ...
    (sci.stat.math)
  • Re: Challenge
    ... HOWEVER this discussion is restricted to him and Jerry. ... Also, see Eberhardt & Fligner, The American Statistician, 31, ... taught at the FIRST COURSE level, to all majors. ... observed value of the TEST STATISTICS ...
    (sci.stat.math)
  • Re: Testing the Equality of Two Population Proportions
    ... Also, see Eberhardt & Fligner, The American Statistician, 31, ... The REASON for using the pooled variance is NOT whether one ... taught at the FIRST COURSE level, to all majors. ... observed value of the TEST STATISTICS ...
    (sci.stat.math)
  • Re: Test of two Independent Proportions: How to Do It RIGHT.
    ... requires a method to assess the Probability of Type I Error or alpha. ... using an un-prespecified level of alpha but through the use of p-value, ... -- Reef Fish Bob. ... As the comment in The American Statistician points out, ...
    (sci.stat.math)

Loading