Re: Question of merging of variance



Ole Dahl Rasmussen wrote:
On Mar 14, 7:31 pm, "Ray Koopman" <koop...@xxxxxx> wrote:
Ole Dahl Rasmussen wrote:
Dear group

I have a question which I actually thought was simple, but still
puzzles me.

In short, I wish to compute the total variance for two separate groups
from the same sample, knowing their size, means and individual
variances only. The groups are different in size, and thus I need some
kind of weighting.

I have found the below description at Wikipedia, but I am not sure
whether it holds, and it does not mention a technique for weighing.

Looking forward to hearing your thought.

Kind Regards,
Ole Dahl Rasmussen

- - -
Suppose that the observations can be partitioned into subgroups
according to some second variable. Then the variance of the total
group is equal to the mean of the variances of the subgroups plus the
variance of the means of the subgroups. This property is known as
variance decomposition or the law of total variance and plays an
important role in the analysis of variance. For example, suppose that
a group consists of a subgroup of men and an equally large subgroup of
women. Suppose that the men have a mean body length of 180 and that
the variance of their lengths is 100. Suppose that the women have a
mean length of 160 and that the variance of their lengths is 50. Then
the mean of the variances is (100 + 50) / 2 = 75; the variance of the
means is the variance of 180, 160 which is 100. Then, for the total
group of men and women combined, the variance of the body lengths will
be 75 + 100 = 175.

In a more general case, if the subgroups have unequal sizes, then they
must be weighted proportionally to their size in the computations of
the means and variances. The formula is also valid with more than two
groups, and even if the grouping variable is continuous.

The formula has as consequence that the variance in the total group
can not be smaller than the mean of the variances in the subgroups. In
general, if you combine subgroups with different means, then the
variance will become larger. In the above example, when the subgroups
are analyzed separately, then the variance is influenced only by the
man-man differences and the woman-woman differences. If the two groups
are combined, however, then the man-women differences enter into the
variance also.

If you have k groups whose sizes, means, and variances are

n_i, m_i, and v_i, i = 1,...,k, then:

1. the total size is N = sum n_i;

2. the total mean is M = sum n_i*m_i / N;

3. the total variance is V = sum n_i*v_i / N + sum n_i*(m_i-M)^2 / N.


Thanks for the fast reply!

A follow-up question:

As I understand you, you calculate the total variance by adding the
weighted average of the variance to calculate the Within-variance (sum
n_i*v_i / N) to a weighted average of the variance between the means,
the Between variance (sum n_i*(m_i-M)^2 / N). The latter is weighted
by weighting the sums of squares. Without the weighting, I assume this
would look like:

sum (m_i-M)^2 / k

Where does the k go in your formula above? Should it perhaps be

(sum n_i*(m_i-M)^2 / N)/k = sum n_i*(m_i-M)^2 / kN

k is hidden in N. If all the n_i were equal then we could write simply
n, and N = k*n; the n's in the formula would cancel, leaving only k.
But k does not play an explicit role in the general unequal-n case.


For the more general case, intuitively, I (think I) get the logic:
Total variance is within variance plus between variance. But what is
the background for reasoning like this? If we acknowledge that the
Total Sum of Squares = Between Sum of Squares + Within Sum of Squares,
how do we get from there to the adding of the variances? It doesn't
seem trivial.

The Within sum of squares is just the weighted sum of the variances.
(Note that all the formulas assume that variances are computed using
n, not n-1, in the denominator.)


Looking forward to hearing opinions on this.

Ole

.



Relevant Pages

  • Re: z test how
    ... variance 2 is variance squared ... sum of V2/n = 0.0000145545 ... at the probability that is my probability that this 2 groups are ... It doesn't mean you need to square it. ...
    (sci.stat.edu)
  • Re: z test how
    ... variance 2 is variance squared ... sum of V2/n = 0.0000145545 ... at the probability that is my probability that this 2 groups are ... P value and statistical significance: ...
    (sci.stat.edu)
  • Re: Variance of SBM Powers
    ... Daniel Mayost wrote: ... What is the variance of integral_0^T_^2*dt), where Wis a Weiner process, of course. ... The Riemann sum for the integral is: ... sum of multiples of squares of independent normals. ...
    (sci.math)
  • Re: N-th moment of the sum of two normally distributed variables
    ... where he decomposes a histogram into two gaussian distributions ... If there is no general way of calculating the moment from the sum ... central momemnts, from the variance alone). ...
    (sci.math)
  • Re: sum of a large covariance matrix
    ... column an asset. ... % Sum of all covariances is variance of this sum ... the variance of a very large portfolio (which becomes theoretically ... the average covariance of each pair of assets where each covariance ...
    (comp.soft-sys.matlab)

Loading