Re: Trying to find significant factors in experimental results
- From: Richard Ulrich <Rich.Ulrich@xxxxxxxxxxx>
- Date: Sun, 13 Apr 2008 20:52:42 -0400
On Sun, 13 Apr 2008 07:12:25 -0700 (PDT), Rob <rtshilston@xxxxxxxxx>
wrote:
Rich,[snip, most]
Thanks for you response. I've interspersed my comments below.
RU > >
That is surely not the case. The most powerful
tests have equal group sizes; and cross-classifications
need cell sizes that are proportionate if the tests are
to remain independent and "unconfounded" with each
other. There still can be tests.
I see. So fringe populations (eg short sighted, colour blind, wearing
contact lenses against those who don't) are less powerful as the
population is enormously mis-blanced? I presume this is because there
might be a larger inter quartile range (or similar) within the small
population because of the lower quantity of results?
No, that is not the reason.
When you compare two means, the test uses the
Standard Error of the two means - The SE is the Standard
deviation (pooled, or for equal variances) divided by
the square root of N. The mean and SE is considered
for each group. When you have a fixed N to divide
between two groups, the way to get the greatest precision
for both groups at once is to have equal Ns; when Ns are
unequal, the precision lost for the smaller N is worse than
the precision gained for the larger N. In fact, this problem
can be quantified by computing the "equivalent N" for the
analysis, using the <reciprocal of the average of the reciprocals>
That is, given Ns of 10 and 100, the average of .1 and .01 is
..055; the reciprocal of that is 18.2 -- Thus, allocating 110
people unequally gives the same power as comparing 18
versus 18.
Occasionally -- when variances are unequal -- there *can*
be an advantage in having a greater N for the group that
is more variable, if the test takes that into account, like
the "t-test for unequal variances." Then the ideal is to
achieve equal SEs for the two groups by varying the Ns.
[snip, rest]
--
Rich Ulrich
http://www.pitt.edu/~wpilib/index.html
.
- References:
- Prev by Date: R clustering using diana and Calinsky and Harabasz Index
- Next by Date: significance of correlation of a bandpass filtered time series
- Previous by thread: Re: Trying to find significant factors in experimental results
- Next by thread: R clustering using diana and Calinsky and Harabasz Index
- Index(es):