Re: Any tricks to get variance of continuous data?



On Thu, 26 May 2005 00:48:38 +0100, Dave <nospam@xxxxxxxxxxx> wrote:

>I have an instrument measuring the frequency of an oscillator. The exact
>frequency is printed every second. Ideally it should be 10MHz, but in
>practice it is not exactly. I want to find the mean and variance - not
>of a set of data that has been collected in the past, but to update this
>every second as a result of the data collected in the last second.
>
>
>The first few data points (in Hz) are like this, and you can see I have
>computed the mean in the second column.
>
>
>9999999.934230 mean=999999.934230
>9999999.933640 mean=999999.933935
>9999999.934420 mean=999999.934097
>9999999.936180 mean=999999.934617
>9999999.936770 mean=999999.935048
>9999999.936770 mean=999999.935335
>9999999.935010 mean=999999.935289
>9999999.934620 mean=999999.935205
>
>Computing the mean is easy, with no need to store past results. I just
>keep a total of all previous results, and divide by the number of
>results. (In fact, I modify that a bit, so I am not having to find the
>mean of a lot of very large, but almost identical numbers. I actually
>subtract 10,000,000 first, to avoid handling large numbers.
>
>So for numerical accuracy I have something like the following in a loop,
>where a function 'new_data_point' computes the latest data.
>
>frequency=new_data_point();
>n=n+1
>sum=sum+(frequency-10000000)
>mean=10000000+sum/n
>
>Is there any way I can find the variance, without storing all past
>results? If I do this for about a week, there will be 600,000 data
>points. Whilst storing the data is not too much of a hassle, computing
>
>(x_n - mean)^2
>
>for all 600,000 values of n will be computationally expensive. Soon the
>data will be arriving faster than I can compute the SD.
>
>Are there any ways around this?

The sample variance is

s^2 = (1/n) sum[i=1..n] (x_i - u)^2

= (1/n) sum[i=1..n] (x_i^2 - 2 x_i u + u^2)

= (1/n) sum[i=1..n] x_i^2 - (1/n) 2u sum[i=1..n]( x_i ) + u^2

= (1/n) sum[i=1..n] (x_i^2) - 2 u^2 + u^2

= (1/n) sum[i=1..n] (x_i^2) - u^2

So you can just keep a running sum of the squares x_i^2 like you do
for the x_i.

--Lynn
.



Relevant Pages

  • Third and fourth moments / Skewness and kurtosis of a sample
    ... When you compute the variance of a population, ... divisor is N; but when you compute the variance of a sample the ... I'd assume it's the same when computing the third ... rather than n-1 as divisor in computing the moments to find skewness ...
    (sci.stat.edu)
  • implementation question
    ... I'm computing the variance of a sequence and on a constant sequence, ... Is there a mathematical trick to deal with this issue in the case of ...
    (sci.math.num-analysis)
  • Re: Any tricks to get variance of continuous data?
    ... The exact ... I want to find the mean and variance - not ... > Computing the mean is easy, with no need to store past results. ... Whilst storing the data is not too much of a hassle, ...
    (sci.math)
  • Any tricks to get variance of continuous data?
    ... The exact frequency is printed every second. ... I want to find the mean and variance - not of a set of data that has been collected in the past, but to update this every second as a result of the data collected in the last second. ... Computing the mean is easy, with no need to store past results. ... Is there any way I can find the variance, without storing all past results? ...
    (sci.math)
  • Re: Do I really need firewall? A newbies question
    ... dont have anything extra added on to it. ... home users most often ... practice safe computing... ...
    (comp.security.firewalls)