# Re: Calculating Variance in real-time

From: Paolo Bellutta (bellutta_at_horsepower.csail.mit.edu)
Date: 06/29/04

```Date: 29 Jun 2004 22:48:45 GMT

```

Michael Newberry <mnewberry@axres.com> wrote:
> Paolo,

> Of course, the mean and variance are computed from sums of values and
> squares---the "one pass" method is the only sensible way to do it. And also,
> the ordering is irrelevant for his 0-dimensional statistics (i.e., stats do
> not involve position on the image). But he cannot remove and replace the
> values and squares in the running sums unless he knows exaclty which values
> are changing. In other words, he needs to know exactly which image pixels at
> what locations changed value in one image relative to the previous image.
> This is required if he is to remove the old values and replace the new
> values into the running sums to efficiently update the mean and std dev. If
> he does not know which pixels remained constant between images, he does not
> know which values to keep in the running sums. I thought those issues were
> pretty obvious.

No, it was not obvious to me. I did not gather from the OP that he wanted to
compute the PIXEL variance. Even so, the same technique can be applied if
you keep an "image" with the running sums and running sum of squares. If
this is a sliding window you still need to store the N previous images,
but still it is a feasible proposition for a reasonably sized image.

I apologize I was thinking that the OP wanted to compute the image variance
not a pixel-by-pixel variance.

Paolo

> Michael

> "Paolo Bellutta" <bellutta@horsepower.csail.mit.edu> wrote in message
> news:40e105bd\$0\$561\$b45e6eb0@senator-bedfellow.mit.edu...
>> Michael Newberry <mnewberry@axres.com> wrote:
>> > Paolo,
>>
>> > With all due respect, what you are saying is not going to work. There is
> not
>> > assumed any correlation between one image and the next. Any pixel of an
>> > image has no relation with the corresponding pixel in any other image.
> So
>> > any stored imformation has no relevance to the new incoming image and a
>> > queue has no purpose here. There is absolutely nothing gained by storing
>> > pixel values from one image to the next.
>>
>> Michael, I am assuming you want to compute mean and variance of all the
> pixels
>> of N images. I think this is what the OP was asking. In this context
> pixels
>> are just data points, and you can rearrange them any way you want.
> Moreover,
>> it is a well known fact you can derive variance from th esum of the values
> of
>> all pixels and the sum of the square of their values. Given these two
> facts
>> you *can* use this method. It is very fast, I have been using this for
> years.
>>
>> Paolo
>> > Michael
>> > .
>> > "Paolo Bellutta" <bellutta@yahoo.com> wrote in message
>> > news:40ddd3cc\$0\$577\$b45e6eb0@senator-bedfellow.mit.edu...
>> >> Michael Newberry <mnewberry@axres.com> wrote:
>> >>
>> >> > He is talking about a different problem. I have used your method. and
> it
>> >> > works splendidly when taking different samples in the same image
> because
>> >> > most of the sample values do not change when you move the sample
> region.
>> >> > herefore there are relatively few replacements and the re-calculation
> is
>> >> > quick. However, Ram has a different problem of measuring variance in
> an
>> >> > image stream, and I assume that the pixel values are not repeated
> from
>> > one
>> >> > image to the next. So he has to compute a new mean and variance each
>> > time.
>> >>
>> >> No, he can still use this technique. It is unclear to me if the OP
> needs
>> >> to compute the mean and variance of all images together or only of the
>> > last
>> >> N images, still here's how you do it:
>> >>
>> >> In the hypotesis of computing mean and variance of all images you begin
>> >> by setting SUM and SUM2 to 0 (zero). For eeach new image you compute
>> >> sum(I[x,y]) and sum(I[x,y] * i[x,y]), you add this value to SUM and
> SUM2
>> >> and compute mean and variance from these.
>> >>
>> >> In the hypotesis of computing mean and variance of the last N images
> you
>> >> store in a circular queue the values of SUM and SUM2 for the last N
> images
>> >> and for each new image you deduct from your accumulators the SUM and
> SUM2
>> >> of the oldest image in the queue (this first item in the queue) and add
>> >> the SUM and SUM2 of the new image. You also store the SUM and SUM2 of
>> >> the new image in your circular queue. You now can compute mean and
>> > variance
>> >>
>> >> Paolo
>> >>
>> >> > Let's say the statistics are measured over a width and height. Ram
> wants
>> > to
>> >> > measure the whole image. I think there are basically 2 ways to get to
> a
>> >> > faster calculation:
>> >>
>> >> > 1. Increase the calculation speed, and/or
>> >> > 2. Reduce the sample size (number of pixels).
>> >>
>> >> > This leads to a few methods:
>> >>
>> >> > 1. Use faster and/or specialized hardware to make the
> computation.
>> >> > 2. Sample a subregion. Remember that computation time goes as
> sample
>> >> > width^2, so sampling only the central 1/2 will take about 1/4 the
> time
>> > to
>> >> > calculate.
>> >> > 3. Use a subsample---that is, fewer pixels---as being
> representative
>> > of
>> >> > the entire image. This is related to option 2, except that the pixels
>> > might
>> >> > be more intelligently chosen, such as in small, strategically
> positioned
>> >> > rectangle regions, or as spaced raster lines.
>> >>
>> >> > Personally, I would look carefully at item 3. Try to justlify to
>> > yourself
>> >> > why you need to sample *every* one of the pixels in each M x N image.
>> >>
>> >> > Michael
>> >>
>> >> > "Vladimir Drzik" <vdrzik@nextra.sk> wrote in message
>> >> >> ip4ram@yahoo.com wrote in message
>> >> >> > I have a series of of images(640 x 480) coming in and need to
>> >> >> > calculate the mean and variance of R,G,B components of each
> pixel.The
>> >> >> > problem is
>> >> >> >
>> >> >> > 1. I have no way to store the sequence of images.All I am allowed
> to
>> >> >> > store is mean and variance(and the number of frames,ofcourse).I
> have
>> >> >> > to calculate the new mean and new variance when a new data comes
>> >> >> > in.(New mean = (n*old_mean + new data)/(n+1)).
>> >> >> >
>> >> >> > 2.I derived a formula for calculating new variance,based on the
> new
>> >> >> > data, old mean,old variance and new mean(which can be calculated
> as
>> >> >> > above).But the problem is this formula is pretty complex in terms
> of
>> >> >> > multiplication(as it needs to be calculated 640 * 480 * 3 times).
>> >> >> >
>> >> >> > Is there any formula/mathematical approximations for new variance
> in
>> >> >> > terms of old variance,old mean,new mean,and new data? i.e new
>> > variance
>> >> >> > = Function(old mean,old variance,new data).
>> >> >> >
>> >> >> > Example:
>> >> >> > say data is 1,2,2,2,2,1,2,3. We have the old mean and old variance
>> >> >> > calculated for this data (n = 8).When a new data ,say 5, comes in,
>> > how
>> >> >> > do we update/approximate the new variance in terms of old mean and
>> > old
>> >> >> > variance(assuming we do not have access to data,since data cannot
> be
>> >> >> > stored).
>> >> >> >
>> >> >> > Thanks in advance
>> >> >> > Ram
>> >> >>
>> >> >> Hi Ram,
>> >> >>
>> >> >> At http://mathworld.wolfram.com/SampleVarianceComputation.html ,
> there
>> >> >> is the expression you want. I don't know if it's less complex than
> the
>> >> >> expression you derived.
>> >> >>
>> >> >> However, there is another approach, which (seems to me) requires
> less
>> >> >> computation. Instead of storing current mean and variance, store
> only
>> >> >> current sum of all values (Sx) and sum of squares of all values
> (Sxx).
>> >> >> Then, at each moment, variance can be computed as
>> >> >> (N*Sxx - Sx*Sx) / (N*N)
>> >> >> where N is the number of samples. Of course, current mean can be
>> >> >> computed as
>> >> >> Sx / N
>> >> >> but you don't need this value for variance computation.
>> >> >>
>> >> >> Regards,
>> >> >>