Re: Sorting and testing multiple distributions



Hi Richard,

first of all thanks for your comment!

On Wed, 27 Sep 2006, Richard Ulrich wrote:
Is this computer software? Appropriate questions for software,
concerning what a 'sample distribution' is ...

It is computer software - however, for analysis of biological data.

- Is it re-run exactly, with different background activities?
(Average the results, mention odd ones.)

Speed is not the issue, but prediction quality.

- Is it tested with different data sets?
(Matched comparisons *may* be appropriate.)

It is - with three different data sets. These datasets are
based on quite different cellular processes/subsystems and thus
are also interesting from a biological perspective.

- Or is one 'sample distribution' a comparison of several
implementations for the same task (for instance, compiler speeds)?

A 'sample distribution' is one of:

- a bootstrap sample of prediction quality benchmarks (e.g. areas under
ROC curves, AUC).

- A 'sample' of e.g. AUCs after stratification by some factor, e.g.
membership to a specific protein complex of enzymatic pathway.

The first case is however not the one I am really interested in. I take
the bootstrap samples only as check that the performance estimate is
not highly variable (it is not - as I get small variances).

The second case is more interesting and the decision is less obvious
due to relatively large variances. This is what I want to test. The
homogeneity test, however, is just one little step, as the data allows
for much more detailled analyses.

bench marking is most convincing, in what I have read, when
the *functional* units are explained. That is to say: One disk
drive is "faster" because it has faster seek times, faster spin,
faster time-to-settle after a long seek. If these factors all line
up, it is hardly necessary to try find a *statistical* test of
differences.

If you mean by 'line up' that the results are consistent - they are not,
at least from what I have seen yet. And it is also not *simply* as expected
(... from previous publications). Hence, I want to base my conclusions on
a sound and robust statistical foundation. There are "functional units"
but due to the (first sight) inconsistent results I could be hard to
associate them with the performance of specific methods.

If the factors are not lined up, it may be impossible
to draw any conclusion that does not depend on the exact
conditions of the test.

I think the only way to takle this is by using diverse data sets, make
an analysis, develop hypotheses about what might be the problem test them
and eventually improve the methods. That's my plan!

(1) sort the distributions by their means (they are not normally
distributed, however) and test distributions that are subsequent in the
ordered list by e.g. Kendall's tau to determine significance.

"Not normal" is a weak excuse for transforming the data.

I see and agree - I would loose some information. I am not sure if a
significant difference with e.g. Kendall's tau of Spearman rank
correlation implies that the differences are also significant in a
test that exploits more data characteristics. Is it like this?

Areas under ROC curves are distributed in the interval [0,1]. Currently
I have no clue what statistical test procedure would be better.


(2) first test for homogeneity with H-test (Kruskal, Wallis) and then use
one of the approaches for multiple pairwise comparisons of mean ranks
(chi^2; Harter, 1960; Tukey-Kramer) that are proposed by my statistics
book of choice (Sachs, Angewandte Statistik, [395]).

(3) test all pairs of distributions and use some multiple testing
correction (Bonferroni, Benjamini, or similar).


"Statistical significance" is not the same as "meaningful
difference."

I know - it also depends on the sample size and this could in principle
be very large to show the tiniest difference with huge statistical
significance. "Meaningful difference", however, can only be found after
considering causal relationships.

If you have functional explanations for superiority, you won't
have much need for precise statistical testing, considering the
benchmark studies that I have read. And if you don't have
the functional explanations, then you have to base your
conclusions strictly on 'this set of tests.'

In the latter case it is questionable if any benchmarking of methods
would be reasonable at all (at least on real data).

On the other hand, to improve we *must* assess what we have. So, in the end
- I think - it *is* reasonable to benchmark even if it is always with the
connotation 'given the present data'. New ways of looking on the
data is what we can do to find a purchase for improvements - or not?

Further on, besides the comparison of the methods itself - there are other
dimensions: What can we learn about the data (here: biology) that might be
responsible for (possibly unexpected) findings?

There remains the question how I can best determine which differences
are significant.


Thanks for you comment!
Kind regards,

Philip









.



Relevant Pages

  • Re: Clergy Letter Project exceeds 11,000 signatures
    ... can you defend the common ancestry of all primates? ... "the legitimacy of random distributions"? ... statistics before I can defend anything else? ... Well as statistics was developed in part to deal with evolution, ...
    (talk.origins)
  • Re: Clergy Letter Project exceeds 11,000 signatures
    ... mean "the legitimacy of random distributions"? ... statistics in his defense of evolution. ... But I once mentioned that strict determinism such as ... I think you are confusing the notions of "knowing" that evolution occurs ...
    (talk.origins)
  • Re: Clergy Letter Project exceeds 11,000 signatures
    ... can you defend the common ancestry of all ... mean "the legitimacy of random distributions"? ... statistics in his defense of evolution. ... But I once mentioned that strict determinism such as ...
    (talk.origins)
  • Re: Statistical Ranking for Non-Normal Populations
    ... >> distributions. ... >> significantly in variance. ... >in statistics, and that one does not seem ...
    (sci.stat.math)

Loading