Re: Test for uniform distribution for small sample size
- From: Richard Ulrich <Rich.Ulrich@xxxxxxxxxxx>
- Date: Wed, 04 Oct 2006 20:21:22 -0400
On 3 Oct 2006 16:08:43 -0700, me13013@xxxxxxxxx wrote:
If all 5 points are less than 0.01, I think you can have
high confidence that the unit interval is not appropriate
as the descriptor, by most tests.
What test you should use, and what power you will have
in using it, depends on the "alternative" or the family
of alternatives.
As I understand this, since I have no knowledge of what the alternative
distributions might be, about the only thing a test can tell me is
whether to reject (with some confidence level) the hypothesis that the
data is uniform. It can't tell me whether to accept that hypothesis
though, right?
Are you sure you can say nothing?
How are the numbers generated? Is the "alternative" going
to be a shortened range, but still uniform? Is it going to be
something with a basement-effect or ceiling-effect, where
numbers are clumped at 0 or 1?
When we test a new statistic, or a statistic that is considered
in odd circumstances, we can generate "null" datasets and
hope that the resulting p-values are properly uniform across 0-1.
In this case, there is particular concern that the tail-proportions
are accurate, i.e., that a 5% or 1% test will "reject" the right
number of times. But that is an example with a very *large*
N, rather than the small N that you are asking about.
The K-S test consists of the distance between the observed
and the hypothesized CDF. If 5 points are less than 0.01,
then the Observed CDF is 1.0 where the hypothesized is
about 0.01, for a difference of 0.99.
I'm not following you there. My concept of a CDF is as a function from
the set of observations to a cumulative probability. I can't match
this up with the CDF being a single number.
In my statement, the CDF of the 5th point was "1.0", at the
place where the mis-fit was the worst, 0.01.
Now, I can see that your
example would produce a K-S statistic of something around .99, so maybe
all I'm missing is an understanding of the lingo.
I'm pretty sure that this
would be a rare result for 5 data points. There are tables that
show what some p-values are, or you could run your own
simulations to see how rare your observed difference is.
I have been doing some simulations in R, to try to get a feel for it.
Stepping back, my real question is, if I have a small sample size, what
is the best way to "decide" whether or not it was generated by a
uniform distribution? Is K-S a good choice? Some references suggest
that Anderson-Darling is better but I don't know if that applies in the
case of the uniform distribution. Are there other tests I should be
considering?
Where do the data come from, and what are you doing with them?
Different tests usually are each 'better' at something-or-other.
Another question might be, What sort of *deviation* from
uniform will bother you? Are you doing hundreds of tests? (Why?)
Is the problem when all the numbers are in a narrow range? near
one particular end?
How many digits of accuracy are given, and is "continuous" important?
For instance, if two values are the same, is that devastating for
your purposes?
--
Rich Ulrich, wpilib@xxxxxxxx
http://www.pitt.edu/~wpilib/index.html
.
- Follow-Ups:
- Re: Test for uniform distribution for small sample size
- From: me13013
- Re: Test for uniform distribution for small sample size
- References:
- Test for uniform distribution for small sample size
- From: me13013
- Re: Test for uniform distribution for small sample size
- From: Richard Ulrich
- Re: Test for uniform distribution for small sample size
- From: me13013
- Test for uniform distribution for small sample size
- Prev by Date: Reef Fish Statistics for Dummies: Applied Simple Regression
- Next by Date: Re: Test for uniform distribution for small sample size
- Previous by thread: Re: Test for uniform distribution for small sample size
- Next by thread: Re: Test for uniform distribution for small sample size
- Index(es):
Relevant Pages
|