Re: Reduction of variance due to grouping
- From: "Rick" <forrest532002@xxxxxxxxx>
- Date: 12 Apr 2005 19:19:32 -0700
Hi
Thank you very much for your responses. I AM conditioning it on the
actual observed numbers. I have the list of N numbers and I have the
number of desired groups n. I can compute the mean of the N numbers,
its variance, its skew, whatever I want. I also know the values of N
and n. From this information, can I estimate the variance after the
grouping. Or at least can I compare the variance-after-grouping of one
list with that of another list. For example, we can say that larger
values of N typically leads to lower variance-after-grouping (this is
what Frank meant that with N->inf, variance-after-grouping->0, but in
my case, N is a small finite number, typically between 10-100).
For example, compare the two distributions:
119 7 2 3 1
and 36 32 27 24 13 (the distributions I am comparing will always have
the same sum).
Even without grouping, I can look and say that the second one will have
lower variance-after-grouping. The following may be good indicators:
number of items in the distribution above the mean (higher number means
typically higher variance-after-grouping), the variance of the original
distribution (before grouping), one-sided variance of the original
distribution (just variance for items above mean), number of elemnents
in the distribution etc. These are ad-hoc indicators -- is there any
study behind this? Is there any result? Can you kindly give me some
pointers?
This problem is motivated by the following scenario: I have multiple
attributes, each of which have values and their occurrence counts (# of
records in a database table that contains that value). I want to select
the attribute that, when grouped into n groups based on their values,
will produce the most uniform groups (in terms of occurrence counts).
Many thanks,
Rick
glenbarnett@xxxxxxxxxxxxx wrote:
> Rick wrote:
> > Hi
> > Say I have the following distribution:
> > 50 48 33 24 14 7 5 3
> > The mean is 23 and the variance is 359.
> >
> > I am now asked to divide the above numbers into 4 groups -- the
goal
> is
> > to make the sum of each group as uniform as possible (basically to
> > reduce the variance as much as possible).
> >
> > This is the optimal grouping:
> > (50) (48) (33, 7, 3) (24, 14, 5)
> >
> > The sums are
> > 50 48 43 43
> >
> > The mean is 46 (obviously) and the new variance is 9.5
> >
> > We have reduced the variance dramatically by doing the grouping.
> > Obviously, the new variance depends on the original distribution
and
> > the number of desired groups. We can easily compute the new
variance
> > after doing the grouping -- my question is that whether it is
> possible
> > to estimate the new variance WITHOUT conducting the grouping? Is
> there
> > some result in statistics that allows us the accurately estimate
the
> > new variance (estimate can be approximate)? If yes, can you kindly
> > point me to right literature?
>
> This looks to be somewhat related to the famous bin-packing problem.
> I'd guess that finding of the arrangement that minimises the variance
> between groups would be NP-hard (as the equivalent optimisation
problem
> is in bin-packing).
>
> I'm not directly aware of results for your particular problem (the
> variance in sizes), but I'd agree that it depends on the probability
> distribution of the original numbers (assuming you don't condition on
> the actual observed numbers).
>
> Still, if you look around problems relating to the bin-packing one
you
> might locate some literature that is more relevant.
>
> Glen
.
- Follow-Ups:
- Re: Reduction of variance due to grouping
- From: glenbarnett
- Re: Reduction of variance due to grouping
- References:
- Reduction of variance due to grouping
- From: Rick
- Re: Reduction of variance due to grouping
- From: glenbarnett
- Reduction of variance due to grouping
- Prev by Date: Re: Goodness of fit measures for a distribution
- Next by Date: Markov process with spatially local events
- Previous by thread: Re: Reduction of variance due to grouping
- Next by thread: Re: Reduction of variance due to grouping
- Index(es):
Relevant Pages
|