Re: data collection control




John Lai wrote:
> Hello All,
>
> I am trying to design a reasonable approach to collect and control
> experimental data.
>
> The dataset I am collecting are Peak Signal to Noise ratio (PSNR)
> measurements. The higher the PSNR value, the better is the result.
>
> The dataset is independent of each other as most data comes from different
> measuring sources or same source at a different time. I would like to
> compare each incoming data against a minimum threshold value. If dataset
> measured falls below this threshold value, I reject the current value of
> this data set and request the owner of this dataset to repeat the experiment
> with different settings so that the PSNR is at least equal to or higher than
> the threshold. Once data is accepted, I add the dataset to the current
> sampling population. At the end of each sampling period, I update the mean
> value of all the data collected.
>
> My problem is how to design this threshold value. My current thinking is
> let this threshold value be the arithmetic mean of the previous sampling
> dataset population mentioned above.
>
> Q1/ Since I cannot allow any dataset to fall below the existing updated
> threshold, the accumulated mean value calculated will be biased -- is this a
> bad thing? (or I can look at it as I am just conforming to the experimental
> design, and that is all dataset must at least be equal to or greater than a
> given threshold).
>
> Q2/ If the above threshold value (arithmetic mean) is determined as above,
> the variance calculated from the population is meaningless (because datasets
> collected are skewed). Hence I can't really do a confidence test on two
> similar experimental data populations taken on different date, or can I?
>
> Q3/ If the above approach of determining threshold is flawed, is there
> another more meaningful (or better) statistical measurement I can use for
> this threshold value (instead of arithmetic mean)? I expect the dataset to
> be in excess of 2000 in size of population each time I sample it (So moving
> average is not possible due to the amount of data).
>
> Can anyone able to offer some advise please?
>
> Thanks in advance,
>
> John

John,

I don't know if I can be of help .... but what you are doing is akin to
detecting anomalies ...

Something I know a lot about ...

It could also be regarded as a Single Dimension Cluster Analysis

You might want to send me a spread*** of values .. and give me a call


for a brief discussion ..

I might be able to point you in the right direction ..

Dave Reilly
Automatic Forecasting Systems
http://www.autobox.com
215-675-0652


P.S. Don't let the word forecasting scare you ... in order to assess an
exceptional value ..one needs to have an equation . this equation leads
to an expectation which leads to computing the probability of observing
what one observes before one observes it thus the facility to declare a
value INCONSISTENT with expectations.

.