Re: Reduction of variance due to grouping



Hi

Thank you very much for your responses. I AM conditioning it on the
actual observed numbers. I have the list of N numbers and I have the
number of desired groups n. I can compute the mean of the N numbers,
its variance, its skew, whatever I want. I also know the values of N
and n. From this information, can I estimate the variance after the
grouping. Or at least can I compare the variance-after-grouping of one
list with that of another list. For example, we can say that larger
values of N typically leads to lower variance-after-grouping (this is
what Frank meant that with N->inf, variance-after-grouping->0, but in
my case, N is a small finite number, typically between 10-100).
For example, compare the two distributions:
119 7 2 3 1
and 36 32 27 24 13 (the distributions I am comparing will always have
the same sum).

Even without grouping, I can look and say that the second one will have
lower variance-after-grouping. The following may be good indicators:
number of items in the distribution above the mean (higher number means
typically higher variance-after-grouping), the variance of the original
distribution (before grouping), one-sided variance of the original
distribution (just variance for items above mean), number of elemnents
in the distribution etc. These are ad-hoc indicators -- is there any
study behind this? Is there any result? Can you kindly give me some
pointers?

This problem is motivated by the following scenario: I have multiple
attributes, each of which have values and their occurrence counts (# of
records in a database table that contains that value). I want to select
the attribute that, when grouped into n groups based on their values,
will produce the most uniform groups (in terms of occurrence counts).

Many thanks,
Rick



glenbarnett@xxxxxxxxxxxxx wrote:
> Rick wrote:
> > Hi
> > Say I have the following distribution:
> > 50 48 33 24 14 7 5 3
> > The mean is 23 and the variance is 359.
> >
> > I am now asked to divide the above numbers into 4 groups -- the
goal
> is
> > to make the sum of each group as uniform as possible (basically to
> > reduce the variance as much as possible).
> >
> > This is the optimal grouping:
> > (50) (48) (33, 7, 3) (24, 14, 5)
> >
> > The sums are
> > 50 48 43 43
> >
> > The mean is 46 (obviously) and the new variance is 9.5
> >
> > We have reduced the variance dramatically by doing the grouping.
> > Obviously, the new variance depends on the original distribution
and
> > the number of desired groups. We can easily compute the new
variance
> > after doing the grouping -- my question is that whether it is
> possible
> > to estimate the new variance WITHOUT conducting the grouping? Is
> there
> > some result in statistics that allows us the accurately estimate
the
> > new variance (estimate can be approximate)? If yes, can you kindly
> > point me to right literature?
>
> This looks to be somewhat related to the famous bin-packing problem.
> I'd guess that finding of the arrangement that minimises the variance
> between groups would be NP-hard (as the equivalent optimisation
problem
> is in bin-packing).
>
> I'm not directly aware of results for your particular problem (the
> variance in sizes), but I'd agree that it depends on the probability
> distribution of the original numbers (assuming you don't condition on
> the actual observed numbers).
>
> Still, if you look around problems relating to the bin-packing one
you
> might locate some literature that is more relevant.
>
> Glen

.



Relevant Pages

  • Re: Measuring Turquoise Underwear
    ... that the distribution had to be normal. ... 6/49 game and has stats for about 52 draws. ... he claims the variance for the mean is /12n. ... The revised formula would yield ...
    (rec.gambling.lottery)
  • Re: feedback...
    ... >>>Hi Duncan, ... >>mean (from N sample draws) to fall with 95% confidence. ... The variance of the mean, after N draws, for a given position is ... the variance of the distribution from which a draw is made. ...
    (rec.gambling.lottery)
  • averaging noisy data (was: Re: Spacecraft earth-flyby data reveals dynamical preferred frame)
    ... filtered data contains much less noise than the raw data, ... The obvious thing to do is to average our N measurements by defining ... What can we say about the probability distribution of xbar? ... We'd like to say that the variance of xbar's distribution is about ...
    (sci.physics.research)
  • Re: Questions about a distribution
    ... Let's say the PDF has the mean = 250 ... The variance tells how ... units as the reaction time measurements, so I would have said that the SD ... lets you know how "wide" the distribution is. ...
    (sci.stat.math)
  • Re: Need Help Determining the "True" Mean of a Sample
    ... > I'm a software engineer, not a statistician, so please forgive my ... > The distribution for these samples is such that about 75% of the ... only 4% as much of the total variance. ... of dropping immediately to zero when the N is under 58. ...
    (sci.stat.math)

Quantcast