Splitting samples to minimize false positives?



Maybe this is master of the obvious: Apologies up front:
By definition, if you sample a distribution and test with respect to a
given alpha as to whether that sample came from that same distribution
the false positive rate (fpr) will be alpha simply by chance.
However, if you split the samples in half the failure rate will be the
same but if you "and" the hypothesis results together (i.e. h1 is 0 if
null hypothesis and 1 otherwise for the first half sample, h2 is the
same for the second, and h=h1 AND h2 is the final hypothesis result
for the entire sample) that the fpr will diminish quadratically (see
matlab below). This is the basic concept of redundant systems.
This, however, doesn't seem to be s standard statistical technique.
Is that because the techniques for combining the results of multiple
tests subsumes this or what? After all, what is the proper thing to
do if you only had one sample/test and it tested positive but there's
the worry that it's just a fluke? Isn't it the case that if the
sample really isn't drawn from the expected/comparison population that
dividing it up shouldn't effect the fpr?

=======
% 1)
% create 1000 tests each with 200 samples of data
% drawn from a normal distribution and test to against
% a normal distribution to see how often it fails (i.e. fpr)
samplesize = 200;
nTrials = 1000;
% X is samplesize x nTrials sized sample matrix from norm distribution
u=0,s=1;
X = randn(samplesize,nTrials);
% test each sample (i.e. column of X) against default alpha = 0.05, 2-
tailed,
% against normal distribution...
[h,p,ci,zval] = ztest(X,0,1); % h(i) is 0 if sample i is Ho, 1 if Ha
(i.e. not normal)
hfrac = mean(h) % fraction of positives (i.e. fpr).
% last result = 0.0620.

% 2)
% 2a)
% take that same data and but test the just first half of each sample
Xh1 = X(1:samplesize/2,:);
[hh1,ph1,cih1,zvalh1] = ztest(Xh1,0,1);
hh1frac = mean(hh1)
% last result = 0.0420.
% 2b)
% take that same data and but test the just second half of each sample
Xh2 = X(samplesize/2+1:end,:);
[hh2,ph2,cih2,zvalh2] = ztest(Xh2,0,1);
hh2frac = sum(hh2) / length(hh2)
% last result = 0.0480
%2c) Combine results of 2a, 2b by anding hypothesis vecotrs
hh1and2 = hh1 & hh2; % hh1and2(i) = hh1(i) and hh2(i) ==> FP only if
both are FPs
h12frac = sum(hh1and2) / length(hh1and2)
% last result = 1.0000e-003
.



Relevant Pages

  • Re: Bell-curve distribution wanted
    ... pseudo-random numbers with a "bell curve" distribution. ... Conditions on the parameters are alpha> -1 ... and sigma is the standard ... Log normal distribution. ...
    (sci.math)
  • RE: KURT and SKEW functions
    ... then the absolute differences whould tend to have a skewness around 0.995, ... and a kurtosis around 0.869. ... distributions for each group with a normal distribution using the KURT and ... the ideal value for the normal distribution. ...
    (microsoft.public.excel.misc)
  • RE: KURT and SKEW functions
    ... then the absolute differences whould tend to have a skewness around 0.995, ... and a kurtosis around 0.869. ... the ideal value for the normal distribution. ...
    (microsoft.public.excel.misc)
  • RE: KURT and SKEW functions
    ... If X follows the Normaldistribution, then the population parameters are ... then the absolute differences whould tend to have a skewness around 0.995, ... and a kurtosis around 0.869. ... the ideal value for the normal distribution. ...
    (microsoft.public.excel.misc)
  • Re: Central Limit Theorem?
    ... The Central Limit Theorem does not always hold! ... >the curiosity that xbar has the same distribution as X. ... >>> converges asymptotally to a NORMAL distribution. ... goodness of fit tests do poorly on ...
    (sci.stat.math)