Re: Splitting samples to minimize false positives?
- From: Richard Ulrich <Rich.Ulrich@xxxxxxxxxxx>
- Date: Sun, 30 Dec 2007 21:05:36 -0500
On Sun, 30 Dec 2007 11:02:37 -0800 (PST), adiamond@xxxxxxxxxxxxxx
wrote:
Maybe this is master of the obvious: Apologies up front:
By definition, if you sample a distribution and test with respect to a
given alpha as to whether that sample came from that same distribution
the false positive rate (fpr) will be alpha simply by chance.
However, if you split the samples in half the failure rate will be the
same but if you "and" the hypothesis results together (i.e. h1 is 0 if
What now defines a "failure", so that you can say
that the failure rate is the same? - Oh, you must be
pointing to the nominal rate.
... now you have TWO samples and TWO tests, and
you need rules for combining them. They will *not*
always give the same outcome. If the overall test
(full sample) happens to be just barely under 5% (say),
it is almost assured that at least one of the two half-sample
tests will not reach 5%, and it could easily happen that
neither of them will -- owing to reduced power from the
reduced Ns.
null hypothesis and 1 otherwise for the first half sample, h2 is the
same for the second, and h=h1 AND h2 is the final hypothesis result
for the entire sample) that the fpr will diminish quadratically (see
matlab below).
The problem with this idea is that the POWER of
finding a tiny, real difference decreases exactly as
dramatically as the effective p-value (size) of the test.
That is a logical, mathematical relationship between them.
If you want a test with reduced power, it is simpler
to use one test with a smaller p-level in the first place.
I think, perhaps, you want to consider a different semantic
expression for testing, in terms of tolerances. You want
a 90% CI to the effect that the actual difference would reject
two samples being the same, using a 5% test. Again, you
get that by using a test much smaller than the original 5%.
This is the basic concept of redundant systems.
You might have to explain "redundant systems" then,
with a close attention to probabilities....
This, however, doesn't seem to be s standard statistical technique.
Is that because the techniques for combining the results of multiple
tests subsumes this or what? After all, what is the proper thing to
do if you only had one sample/test and it tested positive but there's
the worry that it's just a fluke? Isn't it the case that if the
sample really isn't drawn from the expected/comparison population that
dividing it up shouldn't effect the fpr?
Dividing the sample in half gives you two tests,
neither of which has the power of the full-sample test.
If you use results as (A OR B), you have more power;
if you use results as (A AND B), you have less power
for proclaiming a difference.
[snip, example which I did not try to follow]
--
Rich Ulrich
http://www.pitt.edu/~wpilib/index.html
.
- Follow-Ups:
- Re: Splitting samples to minimize false positives?
- From: adiamond
- Re: Splitting samples to minimize false positives?
- References:
- Splitting samples to minimize false positives?
- From: adiamond
- Splitting samples to minimize false positives?
- Prev by Date: Re: Is this enough information to make an inference?
- Next by Date: Re: Questions about a distribution
- Previous by thread: Splitting samples to minimize false positives?
- Next by thread: Re: Splitting samples to minimize false positives?
- Index(es):
Relevant Pages
|