Re: standard deviation and N-1

From: Rob Johnson (rob_at_trash.whim.org)
Date: 07/08/04


Date: Thu, 8 Jul 2004 11:40:01 +0000 (UTC)

In article <Pine.GSO.4.05.10407072147070.8955-100000@hercules.acsu.buffalo.edu>,
Victoria Florsheim <vf2@buffalo.edu> wrote:
>In high school, I learned that the formula for standard deviation has n in
>the denominator, but in college the book has N-1 in the denominator. What
>is the reason for this?
>
>So far, I found this in my book (By Yates, Moore, McCabe):
>Why do we average by dividing by n-1 rather than n? Because the sum of
>deviations is always zero, the last deviation can be found once we know
>the other n-1. So we are not averaging n unrelated numbers. Only n-1 of
>the squared deviations can cary freely, and we average by dividing by the
>total by n-1. The n-1 is called the degrees of freedom of the variance or
>standard deviation.
>
>
>I sort of understand that, but could someone explain in simpler terms and
>expand on that? I'm still a little puzzled as to why n-1.

The point is that the sample mean, m_s, is not the distribution mean,
m_d. Suppose the distribution variance is v_d. n m_s is the sum of n
variates (the sample). Recall that the mean and variance of a sum of
variates are the sums of the means and variances of the variates. That
is, the mean of the sum of the sample is n m_d and the variance of the
sum of the sample is n v_d. In other words,

                   2
    E[ (n m - n m ) ] = n v [1]
           s d d

or equivalently,

               2 1
    E[ (m - m ) ] = - v [2]
         s d n d

Write the distribution variance as

    v
     d

            n
         1 --- 2
    = E[ - > (x - m ) ]
         n --- k d
           k=1

            n
         1 --- 2 2
    = E[ - > ( (x - m ) + 2(x - m )(m - m ) + (m - m ) ) ]
         n --- k s k s s d s d
           k=1

            n
         1 --- 2 1
    = E[ - > (x - m ) ] + - v
         n --- k s n d
           k=1

                1
    = E[ v ] + - v [3]
          s n d

where v_s is the sample variance. Solving [3] for v_d, we get

          n
    v = --- E[ v ] [4]
     d n-1 s

This is why, to compute the distribution variance, we multiply the
sample variance by n/(n-1). Thus, it appears as if we are dividing by
n-1 instead of n.

Rob Johnson <rob@trash.whim.org>
take out the trash before replying



Relevant Pages

  • Re: standard deviation and N-1
    ... but in college the book has N-1 in the denominator. ... >deviations is always zero, the last deviation can be found once we know ... If one computes the expected value of the sum of ... squares of the deviations from the mean of n independent, ...
    (sci.stat.edu)
  • Re: standard deviation and N-1
    ... but in college the book has N-1 in the denominator. ... >deviations is always zero, the last deviation can be found once we know ... If one computes the expected value of the sum of ... squares of the deviations from the mean of n independent, ...
    (sci.math)
  • Re: Successful remote AES key extraction
    ... >square of the timing deviation). ... >tweak wouldn't have noticeably affected the variance. ... ms rms deviation on all servers I tried. ... Routers have CPUs and caches, use table lookup, ...
    (sci.crypt)
  • Re: Calculating Standard Deviation
    ... sum of all the residuals squared. ... Because you want the standard deviation, not the mean absolute deviation. ... Squares have much nicer properties than absolute values. ...
    (sci.math.num-analysis)
  • Re: Square Root in VBA
    ... Depending on what you are doing, it is possible to derive the standard ... deviation from the variance using a matrix and row construct with MMULT ... Function test1(RegionR, CovarMatrix) As Variant ...
    (microsoft.public.excel.programming)