Re: standard deviation, but without the mean




"Ray Koopman" <koopman@xxxxxx> wrote in message
news:1142035741.238276.44830@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
David A. Heiser wrote:
"Ray Koopman" <koopman@xxxxxx> wrote in message
news:1141669275.577944.112830@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
richardstartz@xxxxxxxxxxx wrote:
The standard deviation is the square root of the variance (of course).
There's a standard formula for computing the variance from a running
sum. Suppose Xsum is the sum of the the first n numbes and that X2 is
the sum of the squares. Then

var = X2/n - (Xsum/n)^2

Just keep track if X2 and Xsum as you go.

Using running totals can sometimes lead to cancellation errors.
See http://tinyurl.com/d6ax2 for a stable algorithm.

+++++++++++++++++++++++++++++++++++
Actually this is not a correct algorithm, just an approximation.

I've just gone through Welford's algorithm (a one pass calculation) using
xnumbers on the NIST data sets and have found Welford to obtain correct
values. The errors come in due to the general problem of summation of
lists
of numbers, which no algorithm can fix. The solution of course is to do
it
with as many digits as possible, so that the summation errors are not
important.

For example, running Welford's on the NIST NumAcc4 data set (1001 values)
using a 30 digit exact computation comes out to the theoretical value,
with
the error in the least 9 digits (21 accurate digits).. By using Kahan's
method of summation, this reduces the error to about 7 digits.

If people would make the effort to read Knuth, a lot of the
misconceptions
would disappear. Welford published in Technometrics, 1962, pg 419-420.
All
this is in Knuth. Apparently everybody else missed it.

David Heiser

The pseudocode that Miller attributes to Jennrich is what
http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
calls Algorithm II, saying that it is "due to Knuth, who cites
Welford." Coding it in Mathematica as

n = m = ss = 0;
Scan[(d = # - m; m += d/(++n); ss += d*(# - m))&, data];
{m, Sqrt[ss/(n-1)]}

and applying it to the NumAcc4 data gives

{10000000.200000001, 0.10000000055890503}

for the mean and s.d.
+++++++++++++++++++++++++++++++++++++++++++
Gosh, I'm surprised at how poor Mathematica does on this. Doing it in Excel
using a 30 digit xnumber function (which I wrote) I get
0.10000000000000000000024682259.
DAH


.



Relevant Pages

  • Re: standard deviation, but without the mean
    ... Suppose Xsum is the sum of the the first n numbes and that X2 is ... Just keep track if X2 and Xsum as you go. ... Actually this is not a correct algorithm, ... the error in the least 9 digits.. ...
    (sci.stat.math)
  • Re: standard deviation, but without the mean
    ... There's a standard formula for computing the variance from a running ... Suppose Xsum is the sum of the the first n numbes and that X2 is ... Just keep track if X2 and Xsum as you go. ... Actually this is not a correct algorithm, ...
    (sci.stat.math)
  • "Algorithmic Randomness, Quantum Physics, and Incompleteness"
    ... all finite sequences are to be found infinitely often and ... different members of the infinite set of random numbers. ... Just using random numbers with initial digits 1 thru 9, ... it generated by a shorter input algorithm than the length ...
    (sci.logic)
  • Re: BigNum -- Floating Point
    ... > It means the memory required for representing a number is just ... write an RSA algorithm, for example. ... interest might be something like pi...in base N...to M digits. ... >> wouldn't the gcd() itself need to be able to handle bigints? ...
    (comp.programming)
  • every number has its own significance.....
    ... 18 is the only number that is twice the sum of its digits. ... 21 is the smallest number of distinct squares needed to tile a square. ... 26 is the only number to be directly between a square and a cube. ... 27 is the largest number that is the sum of the digits of its cube. ...
    (sci.crypt)