Re: standard deviation and N-1

From: Rob Johnson (rob_at_trash.whim.org)
Date: 07/08/04


Date: Thu, 8 Jul 2004 11:40:01 +0000 (UTC)

In article <Pine.GSO.4.05.10407072147070.8955-100000@hercules.acsu.buffalo.edu>,
Victoria Florsheim <vf2@buffalo.edu> wrote:
>In high school, I learned that the formula for standard deviation has n in
>the denominator, but in college the book has N-1 in the denominator. What
>is the reason for this?
>
>So far, I found this in my book (By Yates, Moore, McCabe):
>Why do we average by dividing by n-1 rather than n? Because the sum of
>deviations is always zero, the last deviation can be found once we know
>the other n-1. So we are not averaging n unrelated numbers. Only n-1 of
>the squared deviations can cary freely, and we average by dividing by the
>total by n-1. The n-1 is called the degrees of freedom of the variance or
>standard deviation.
>
>
>I sort of understand that, but could someone explain in simpler terms and
>expand on that? I'm still a little puzzled as to why n-1.

The point is that the sample mean, m_s, is not the distribution mean,
m_d. Suppose the distribution variance is v_d. n m_s is the sum of n
variates (the sample). Recall that the mean and variance of a sum of
variates are the sums of the means and variances of the variates. That
is, the mean of the sum of the sample is n m_d and the variance of the
sum of the sample is n v_d. In other words,

                   2
    E[ (n m - n m ) ] = n v [1]
           s d d

or equivalently,

               2 1
    E[ (m - m ) ] = - v [2]
         s d n d

Write the distribution variance as

    v
     d

            n
         1 --- 2
    = E[ - > (x - m ) ]
         n --- k d
           k=1

            n
         1 --- 2 2
    = E[ - > ( (x - m ) + 2(x - m )(m - m ) + (m - m ) ) ]
         n --- k s k s s d s d
           k=1

            n
         1 --- 2 1
    = E[ - > (x - m ) ] + - v
         n --- k s n d
           k=1

                1
    = E[ v ] + - v [3]
          s n d

where v_s is the sample variance. Solving [3] for v_d, we get

          n
    v = --- E[ v ] [4]
     d n-1 s

This is why, to compute the distribution variance, we multiply the
sample variance by n/(n-1). Thus, it appears as if we are dividing by
n-1 instead of n.

Rob Johnson <rob@trash.whim.org>
take out the trash before replying



Relevant Pages

  • Re: standard deviation and N-1
    ... but in college the book has N-1 in the denominator. ... >deviations is always zero, the last deviation can be found once we know ... If one computes the expected value of the sum of ... squares of the deviations from the mean of n independent, ...
    (sci.stat.edu)
  • Re: standard deviation and N-1
    ... but in college the book has N-1 in the denominator. ... >deviations is always zero, the last deviation can be found once we know ... If one computes the expected value of the sum of ... squares of the deviations from the mean of n independent, ...
    (sci.math)
  • Re: Successful remote AES key extraction
    ... >square of the timing deviation). ... >tweak wouldn't have noticeably affected the variance. ... ms rms deviation on all servers I tried. ... Routers have CPUs and caches, use table lookup, ...
    (sci.crypt)
  • Re: Calculating Standard Deviation
    ... sum of all the residuals squared. ... Because you want the standard deviation, not the mean absolute deviation. ... Squares have much nicer properties than absolute values. ...
    (sci.math.num-analysis)
  • Re: Reducing the number of points in a dataset with matlab
    ... When you speak of what you are minimizing, the deviation between the interpolate new dataset and the old dataset, is deviation the sum of the absolute value of the differences, or is it the square root of the sum of the suqares of the differences? ... If I hypothetically found a subset that predicted 149 of the points perfectly but was 13 off on the prediction of the last point ...
    (comp.soft-sys.matlab)