Re: Statistical Ranking for Non-Normal Populations
From: Peter Hach (Willy.Yoda_at_gmx.net)
Date: 10/16/04
- Next message: Mohammad Ejaz: "Multivariate and Multinomial."
- Previous message: Peter Hach: "Re: Statistical Ranking for Non-Normal Populations"
- In reply to: Richard Ulrich: "Re: Statistical Ranking for Non-Normal Populations"
- Messages sorted by: [ date ] [ thread ]
Date: Sat, 16 Oct 2004 02:07:05 +0000 (UTC)
On Thu, 14 Oct 2004 22:36:57 -0400, Richard Ulrich wrote:
>On Wed, 13 Oct 2004 18:30:39 +0000 (UTC), Willy.Yoda@gmx.net (Peter
>Hach) wrote:
>
>> I need to perform (statistical) ranking of a number of large, but
>> finite popolations X[i] = (x[i][1], ... ,x[i][n]) in a scenario
where
>> acquiring each x[i][j] is very expensive. I am looking for the
>> population X[i] with the smallest Sum or Average over the x[i][j]
>> (i.e. I am only interested in the top-ranked one). Furthermore, all
>> x[i][j] are strictly larger than 0.
>>
>> I've started looking into the statistical ranking techniques, and
most
>> work I've seen assumes that samples are generated from normal
>> distributions (and their variances are equal). I suspect this is
>> because in these cases
>>
>> Sample-Mean - True-Mean
>> ----------------------- ~ student's t distribution
>> sqrt(Sample-Variance/(n))
>>
>> i.e. one is not reliant on knowing the true Variance...?
>> Unfortunately, my data is non-normal and the X[i]s may differ
>> significantly in variance.
>
>As I read it: You want to be able to state what you assurance
>you may have that a particular 'finite population', among several
>of the same size, will have the largest mean, based on a partial
>sampling.
>
>The original distribution being non-normal is not much
>of a problem, so long as the sum is well-behaved.
>
>Differences in variance may not be as much of a problem
>if the samples are from the same family. Do you know
>they are always finite? Is there much systematic difference
>expected between samples?
>
>
>But when you allow arbitrary and varied distributions,
>you don't have much chance of placing much confidence
>in the observed ordering, based on a part-sample.
>
>- There are a *lot* of problems that are possible to work on
>in statistics, and that one (totally arbitrary PDFs) does not seem
>particularly interesting to me. I think you might be stuck with
>generalizing from very gross inequalities, based on observed
>variances. Or you could have even less precision, using ranks.
>
>If this is something real, there are probably a number of
>helpful conditions that could be assumed, and there will
>be a better chance that someone has worked on something
>similar.
Here's what I know about the distributions of the X[i]:
(a) There are significant, but not outrageous differences in the value
of the X[i][j] (a factor of up to ~1000 between the largest and the
least). All X[i][j] are larger than 0.
(b) There is significant covariance between the distributions of X[i]
and X[i'], i.e. if X[i][j] is large compared to the other values in
X[i], X[i'][j] is likely to be large in relation to the other values
in X[i'].
(c) For all for every value of j I know upper and lower bounds on
X[k][j] (i.e. for all k: low[j] <= X[k][j] <= high[j]).
(d) The number n of total values in each population is large (at least
>1000, in most cases >50000).
(e) Since I know the bounds on the values, I can compute a worst-case
variance for each X[i], if that would help matters...
Is any of this usefull?
- Next message: Mohammad Ejaz: "Multivariate and Multinomial."
- Previous message: Peter Hach: "Re: Statistical Ranking for Non-Normal Populations"
- In reply to: Richard Ulrich: "Re: Statistical Ranking for Non-Normal Populations"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|