Re: Standard Deviation Question



On 15 Aug 2006 20:51:46 -0700, "DJENSE00@xxxxxxxxx"
<DJENSE00@xxxxxxxxx> wrote:

Hi all...

Probably a really naive question so pardon me... but here goes!

We routinely do standard deviations for all our data as part of a
reporting process. However, sometimes the number of data points for
each sample fall to two. I am proposing that we do not calculate
standard deviations on any sample less than three. It is somewhat
common sense to me that it wouldn't mean much and probably a sample of
size three isn't much better but we have to draw the line somewhere.

What you are dealing with is the variability of the estimate
itself. For the theory, start with the square of the SD, which is the
variance: One "variance ratio test" for homogeneity of two
variances is simply the F-test from putting the larger over the
smaller.

Okay, now consider how stable two variance estimates are,
by looking at the distribution of the F-statistic for certain
degrees of freedom. Here is a little table for df= 10,10
(Ns of 11) and df= 100,100. [Might switch to fixed font].

d.f. (10,10) (100,100)
tail (p)
25% 1.55 1.17
5% 2.97 1.39
1% 4.85 1.59

With two Normal samples of d.f. (10,10), 1% of the time,
you get a variance ratio or 4.85 to 1, or SD ratio of about 2.2.
-- Since the formula says, "Put the larger variance on top",
and that could be either group, the effective tail p-level is
actually 2%, not the nominal 1%. I will ignore "the actual"
in the rest of the note. We will hope that I was accurate in
copying the tabled values, as reported.

With two Normal samples of d.f. (100,100), 1% of the time,
you get a variance ratio or 1.59 to 1, or SD ratio of about 1.26,
which is much better than 2.2.


Next, consider the accuracy of a 1 d.f. estimate taken as the
*smaller* estimate, so it is in the denominator of the F(inf, 1)
Below is a table with 1, 2, 3, 4, and 10 d.f. The huge gain
of accuracy is clearly between 1 and 2, where you can expect the
1%-of-the-time inaccuracy in the SD to decrease from 80-fold
to 10-fold; but 2 d.f. and 3 d.f. don't look very desirable,
at 10-fold and 5-fold error. How much inaccuracy can you live
with? How many d.f. do the other estimates have, and how
many estimates would be censored to 'missing'? Is it more
useful to have an inaccurate estimate than no estimate at all?

d.f. (inf,1) (inf,2) (inf,3) (inf,4) (inf,10)
tail (p)
25% 9.9 3.5 2.47 2.08 1.48
5% 254. 19.5 8.53 5.6 2.54
1% 6366. 99.5 26.1 13.4 3.91


Another consideration may be that the 1 d.f. estimate is
not going to be too *large* by the same margin. A few random
points may be not-close, but being a few z-units out does
not have the same profound effect on the ratio --
Looking at the 1 d.f. estimate in the numerator ("larger").

d.f. (1,inf)

25% 1.32
5% 3.84
1% 6.64




Can someone give me a reason based on statistical theory why we might
want to calculate the standard deviations or not calculate them if the
sample size falls below a threshold value - in our case 2. Also, is

I hope the numbers above will help.

there a valid statistical reason why a mean might be somewhat
appropriate for such a small sample size. I am not a statistician -
but it seems that a mean might be at least marginally more appropriate
for such a small sample as opposed to a standard deviation. Is my
intuition correct or can someone convince me I am wrong? I am
interested in having a statistical justification one way or another to
back up my intuition - or shoot it down as the cas may be.

I don't understand the follow-up question. You seem to
be comparing the *worth* of the SD to that of the mean.
Usually, "location" and "scale" are each interesting,
separately.

Are you never, regularly presenting the means?
Do the means tell you what SD to expect, because the
distributions are log-normal or Poisson?



--
Rich Ulrich, wpilib@xxxxxxxx
http://www.pitt.edu/~wpilib/index.html
.



Relevant Pages

  • Re: Correlation Between Mean and Standard Deviation
    ... clearly for decent size N the second sample will have a higher variance ... Suppose I draw two samples from the same population, but sample A is much bigger than sample B. The expected value of both standard deviations is the same (and equals the standard deviation of the population), but the expected range of A is greater than the expected range of B, in essence because the larger sample size gives you more opportunity to get observations from the tails of the distribution. ...
    (sci.stat.math)
  • Re: Measuring Time to Convergence
    ... a Poisson Distribution -- which has the property that the Variance equals the Mean. ... There is a close relationship between Poisson Distributions and Binomial Distributions for large sample sizes. ... using the standard sample mean and standard deviations; I'm finding my standard deviation is more than my mean. ...
    (comp.ai.genetic)
  • Re: Stdev by dividation
    ... > I want to pool the standard deviations from mean values divided on ... As Stefano suggests, google on "error propagation". ... underestimate the true variance, ...
    (sci.stat.math)
  • Re: standard deviation and N-1
    ... but in college the book has N-1 in the denominator. ... Suppose sig^2_s the variance of your sample, ... samples will be the true variance of the population. ... *standard deviations*. ...
    (sci.math)