Re: Standard Deviation Question
- From: Richard Ulrich <Rich.Ulrich@xxxxxxxxxxx>
- Date: Wed, 16 Aug 2006 21:50:26 -0400
On 15 Aug 2006 20:51:46 -0700, "DJENSE00@xxxxxxxxx"
<DJENSE00@xxxxxxxxx> wrote:
Hi all...
Probably a really naive question so pardon me... but here goes!
We routinely do standard deviations for all our data as part of a
reporting process. However, sometimes the number of data points for
each sample fall to two. I am proposing that we do not calculate
standard deviations on any sample less than three. It is somewhat
common sense to me that it wouldn't mean much and probably a sample of
size three isn't much better but we have to draw the line somewhere.
What you are dealing with is the variability of the estimate
itself. For the theory, start with the square of the SD, which is the
variance: One "variance ratio test" for homogeneity of two
variances is simply the F-test from putting the larger over the
smaller.
Okay, now consider how stable two variance estimates are,
by looking at the distribution of the F-statistic for certain
degrees of freedom. Here is a little table for df= 10,10
(Ns of 11) and df= 100,100. [Might switch to fixed font].
d.f. (10,10) (100,100)
tail (p)
25% 1.55 1.17
5% 2.97 1.39
1% 4.85 1.59
With two Normal samples of d.f. (10,10), 1% of the time,
you get a variance ratio or 4.85 to 1, or SD ratio of about 2.2.
-- Since the formula says, "Put the larger variance on top",
and that could be either group, the effective tail p-level is
actually 2%, not the nominal 1%. I will ignore "the actual"
in the rest of the note. We will hope that I was accurate in
copying the tabled values, as reported.
With two Normal samples of d.f. (100,100), 1% of the time,
you get a variance ratio or 1.59 to 1, or SD ratio of about 1.26,
which is much better than 2.2.
Next, consider the accuracy of a 1 d.f. estimate taken as the
*smaller* estimate, so it is in the denominator of the F(inf, 1)
Below is a table with 1, 2, 3, 4, and 10 d.f. The huge gain
of accuracy is clearly between 1 and 2, where you can expect the
1%-of-the-time inaccuracy in the SD to decrease from 80-fold
to 10-fold; but 2 d.f. and 3 d.f. don't look very desirable,
at 10-fold and 5-fold error. How much inaccuracy can you live
with? How many d.f. do the other estimates have, and how
many estimates would be censored to 'missing'? Is it more
useful to have an inaccurate estimate than no estimate at all?
d.f. (inf,1) (inf,2) (inf,3) (inf,4) (inf,10)
tail (p)
25% 9.9 3.5 2.47 2.08 1.48
5% 254. 19.5 8.53 5.6 2.54
1% 6366. 99.5 26.1 13.4 3.91
Another consideration may be that the 1 d.f. estimate is
not going to be too *large* by the same margin. A few random
points may be not-close, but being a few z-units out does
not have the same profound effect on the ratio --
Looking at the 1 d.f. estimate in the numerator ("larger").
d.f. (1,inf)
25% 1.32
5% 3.84
1% 6.64
Can someone give me a reason based on statistical theory why we might
want to calculate the standard deviations or not calculate them if the
sample size falls below a threshold value - in our case 2. Also, is
I hope the numbers above will help.
there a valid statistical reason why a mean might be somewhat
appropriate for such a small sample size. I am not a statistician -
but it seems that a mean might be at least marginally more appropriate
for such a small sample as opposed to a standard deviation. Is my
intuition correct or can someone convince me I am wrong? I am
interested in having a statistical justification one way or another to
back up my intuition - or shoot it down as the cas may be.
I don't understand the follow-up question. You seem to
be comparing the *worth* of the SD to that of the mean.
Usually, "location" and "scale" are each interesting,
separately.
Are you never, regularly presenting the means?
Do the means tell you what SD to expect, because the
distributions are log-normal or Poisson?
--
Rich Ulrich, wpilib@xxxxxxxx
http://www.pitt.edu/~wpilib/index.html
.
- Prev by Date: FACIAL EXPRESSION SURVEY REQUEST
- Next by Date: MLE vs. UMVUE?
- Previous by thread: FACIAL EXPRESSION SURVEY REQUEST
- Next by thread: MLE vs. UMVUE?
- Index(es):
Relevant Pages
|