Re: standard deviation and N-1

From: Ray Koopman (koopman_at_sfu.ca)
Date: 07/08/04


Date: 8 Jul 2004 01:56:32 -0700

Victoria Florsheim <vf2@buffalo.edu> wrote in message
news:<Pine.GSO.4.05.10407072147070.8955-100000@hercules.acsu.buffalo.edu>...
> In high school, I learned that the formula for standard deviation
> has n in the denominator, but in college the book has N-1 in the
> denominator. What is the reason for this?
>
> So far, I found this in my book (By Yates, Moore, McCabe):
> Why do we average by dividing by n-1 rather than n? Because the sum
> of deviations is always zero, the last deviation can be found once
> we know the other n-1. So we are not averaging n unrelated numbers.
> Only n-1 of the squared deviations can cary freely, and we average
> by dividing by the total by n-1. The n-1 is called the degrees of
> freedom of the variance or standard deviation.
>
> I sort of understand that, but could someone explain in simpler terms
> and expand on that? I'm still a little puzzled as to why n-1.

Short answer (what, but not why): If your data is a sample from a
population, and you want to estimate the variance in the population,
then use n-1. But if you're only interested in describing the data,
treating it as a population in its own right, then use n.

Longer answer: First, the argument focuses on the variance, not the
standard deviation, because the variance is easy to deal with but
the s.d. is awkward. Second, the problem involves the value, M, from
which the deviations, Xi - M, are taken. If M = the population mean
then we divide the sum of squared deviation by n. Everybody agrees on
that. But if M = the sample mean then the sum of squared deviations
will be smaller than it would be if M were the population mean.
This is because one way of defining the sample mean is that it's
the value from which the sum of squared deviations is smallest.
On average, the sum of squared deviations from the sample mean is
(n-1)/n times as big as it would be if the deviations were taken
from the population mean. So you can think of the formula as first
multiplying the sum of squared deviations from the sample mean
by n/(n-1), to make it about as big as it would have been if the
deviations had been taken from the population mean; and then dividing
by n, as if the deviations had actually been taken from the population
mean. Putting the two operations together gives the division by n-1.



Relevant Pages

  • Re: Normality - A two-fold test
    ... sad = sum of the absolute deviations ... Simulation allows us to estimate the standard deviation of a normal population (sigma) based on the range obtained by a random samples. ...
    (sci.stat.math)
  • Re: Autodetect closed region
    ... > Can you please explain to me what the sum should be in order the ... positive angles be deviations toward the ... If the initial pass says that the loop you find has the region outside ... it, you should really remember that loop, discard the segments inside ...
    (sci.math)
  • Re: Box Plots--Acceptance?
    ... the median minimizes the absolute ... > deviations, and the mode minimizes the count of deviations ... The mean is the value of m which minimizes sum for n = 2. ...
    (sci.stat.math)