Re: Need to aggregate t-test stats



This is really a reply to all: Thanks so much for
your help. Lots of
good stuff. Jesus, what did I do before the
internet!

I did stumble into the Holmes Procedure. The fisher
procedure looks
easy and I think the required independence exists but
I haven't been
able to google more about it (if anyone has a
link). It's
interesting that it it's barely mentioned in
multiple-test stuff I've
googled.

W/r to
Richard Ulrich's comments (to the extent that I
understood or haven't
misinterpreted what you wrote) In the end of the day
my main goal is
to have an automated script that validates the new
ver of the db. The
point being that it comes down to the objective
automated numbers.
BTW, I'm also trying to limit myself to routines that
can use the
summary stats generated by sql (vs actually getting
the samples).
Still, your post did make me think about what's the
most likely screw
up and how to home in on it. I'm betting that the
most likely screw
up is something that leads to lots of blank fields
for a source. I
imagine a proportionality test will nail that very
quickly and with a
confidence level that would make combining a moot
point (e.g. from my
original example, say there's a good number of french
names and the
fraction of blanks go from 1% to 80% I guess the
other stats don't
matter: The new db is "bad". Obvoiusly, the fun
starts when the
pvalue is 1% but you've run so many tests that it's
hard to say
whether that's an issue...but if the pvalue is
0.0001% (not so nutty
in the above scenario and I think that's the point
that you were
making).


ON ANOTHER TOPIC: (should be another post but hey,
I'm here ;-)

I've been using the perl statistics:ttest package
which calls the
statsistics::distribution package's fdistr function
to give back the
integral w/r to a specified significance level (it
does this because
it automatically does an F test to see if the
variances of the two
samples given to the t-test are equal or not) but
fdistr fails on
varios df combinations (always when both are odd).
Here's some
results (from my test program):
FAILURE fdistr(23,67,0.025)
FAILURE fdistr(23,69,0.025)
FAILURE fdistr(23,71,0.025)
FAILURE fdistr(23,73,0.025)
FAILURE fdistr(23,75,0.025)
FAILURE fdistr(23,77,0.025)
FAILURE fdistr(23,79,0.025)

Has anyone seem this before? I've constructed a
clumsy workaround but
I have to believe that other perl users must have
done something more
sophisticates as t-tests are pretty damn important
(not to mention F
tests!).
Anyway, if anyone is using perl, please try this and
tell me if it
fails for you too.


The multiple variance analog to pairwise comparisons is known as Bartlett's test for the equality of several variances. Bartlett tests for simultaneous equality. This also assumes that your counts are large enough that you can treat them as normal. Here's a link.

http://www.itl.nist.gov/div898/handbook/eda/section3/eda357.htm

You can either apply Barlett through a statistical software package, or simply type in the formulas in an Excel spread***.

Jack
.