Re: Statistical Ranking for Non-Normal Populations

From: Peter Hach (Willy.Yoda_at_gmx.net)
Date: 10/16/04


Date: Sat, 16 Oct 2004 02:07:05 +0000 (UTC)

On Thu, 14 Oct 2004 22:36:57 -0400, Richard Ulrich wrote:
>On Wed, 13 Oct 2004 18:30:39 +0000 (UTC), Willy.Yoda@gmx.net (Peter
>Hach) wrote:
>
>> I need to perform (statistical) ranking of a number of large, but
>> finite popolations X[i] = (x[i][1], ... ,x[i][n]) in a scenario
where
>> acquiring each x[i][j] is very expensive. I am looking for the
>> population X[i] with the smallest Sum or Average over the x[i][j]
>> (i.e. I am only interested in the top-ranked one). Furthermore, all
>> x[i][j] are strictly larger than 0.
>>
>> I've started looking into the statistical ranking techniques, and
most
>> work I've seen assumes that samples are generated from normal
>> distributions (and their variances are equal). I suspect this is
>> because in these cases
>>
>> Sample-Mean - True-Mean
>> ----------------------- ~ student's t distribution
>> sqrt(Sample-Variance/(n))
>>
>> i.e. one is not reliant on knowing the true Variance...?
>> Unfortunately, my data is non-normal and the X[i]s may differ
>> significantly in variance.
>
>As I read it: You want to be able to state what you assurance
>you may have that a particular 'finite population', among several
>of the same size, will have the largest mean, based on a partial
>sampling.
>
>The original distribution being non-normal is not much
>of a problem, so long as the sum is well-behaved.
>
>Differences in variance may not be as much of a problem
>if the samples are from the same family. Do you know
>they are always finite? Is there much systematic difference
>expected between samples?
>
>
>But when you allow arbitrary and varied distributions,
>you don't have much chance of placing much confidence
>in the observed ordering, based on a part-sample.
>
>- There are a *lot* of problems that are possible to work on
>in statistics, and that one (totally arbitrary PDFs) does not seem
>particularly interesting to me. I think you might be stuck with
>generalizing from very gross inequalities, based on observed
>variances. Or you could have even less precision, using ranks.
>
>If this is something real, there are probably a number of
>helpful conditions that could be assumed, and there will
>be a better chance that someone has worked on something
>similar.

Here's what I know about the distributions of the X[i]:

(a) There are significant, but not outrageous differences in the value
of the X[i][j] (a factor of up to ~1000 between the largest and the
least). All X[i][j] are larger than 0.
(b) There is significant covariance between the distributions of X[i]
and X[i'], i.e. if X[i][j] is large compared to the other values in
X[i], X[i'][j] is likely to be large in relation to the other values
in X[i'].
(c) For all for every value of j I know upper and lower bounds on
X[k][j] (i.e. for all k: low[j] <= X[k][j] <= high[j]).
(d) The number n of total values in each population is large (at least
>1000, in most cases >50000).
(e) Since I know the bounds on the values, I can compute a worst-case
variance for each X[i], if that would help matters...

Is any of this usefull?



Relevant Pages

  • Re: Thinking of taking the plunge into Professional Poker
    ... understanding of statistics. ... Just so you know "Man" I got an A in a college level course on ... The t-distribution describes the sampling distribution for a normally ... distributed random variable with unknown variance. ...
    (rec.gambling.poker)
  • Re: Sorting and testing multiple distributions
    ... for much more detailled analyses. ... There are "functional units" ... however) and test distributions that are subsequent in the ... that are proposed by my statistics ...
    (sci.stat.math)
  • Re: Clergy Letter Project exceeds 11,000 signatures
    ... can you defend the common ancestry of all primates? ... "the legitimacy of random distributions"? ... statistics before I can defend anything else? ... Well as statistics was developed in part to deal with evolution, ...
    (talk.origins)
  • Re: Clergy Letter Project exceeds 11,000 signatures
    ... can you defend the common ancestry of all ... mean "the legitimacy of random distributions"? ... statistics in his defense of evolution. ... But I once mentioned that strict determinism such as ...
    (talk.origins)
  • Re: Clergy Letter Project exceeds 11,000 signatures
    ... mean "the legitimacy of random distributions"? ... statistics in his defense of evolution. ... But I once mentioned that strict determinism such as ... I think you are confusing the notions of "knowing" that evolution occurs ...
    (talk.origins)