Re: Test for uniform distribution for small sample size



On 3 Oct 2006 16:08:43 -0700, me13013@xxxxxxxxx wrote:

If all 5 points are less than 0.01, I think you can have
high confidence that the unit interval is not appropriate
as the descriptor, by most tests.

What test you should use, and what power you will have
in using it, depends on the "alternative" or the family
of alternatives.

As I understand this, since I have no knowledge of what the alternative
distributions might be, about the only thing a test can tell me is
whether to reject (with some confidence level) the hypothesis that the
data is uniform. It can't tell me whether to accept that hypothesis
though, right?

Are you sure you can say nothing?

How are the numbers generated? Is the "alternative" going
to be a shortened range, but still uniform? Is it going to be
something with a basement-effect or ceiling-effect, where
numbers are clumped at 0 or 1?

When we test a new statistic, or a statistic that is considered
in odd circumstances, we can generate "null" datasets and
hope that the resulting p-values are properly uniform across 0-1.
In this case, there is particular concern that the tail-proportions
are accurate, i.e., that a 5% or 1% test will "reject" the right
number of times. But that is an example with a very *large*
N, rather than the small N that you are asking about.



The K-S test consists of the distance between the observed
and the hypothesized CDF. If 5 points are less than 0.01,
then the Observed CDF is 1.0 where the hypothesized is
about 0.01, for a difference of 0.99.

I'm not following you there. My concept of a CDF is as a function from
the set of observations to a cumulative probability. I can't match
this up with the CDF being a single number.

In my statement, the CDF of the 5th point was "1.0", at the
place where the mis-fit was the worst, 0.01.

Now, I can see that your
example would produce a K-S statistic of something around .99, so maybe
all I'm missing is an understanding of the lingo.

I'm pretty sure that this
would be a rare result for 5 data points. There are tables that
show what some p-values are, or you could run your own
simulations to see how rare your observed difference is.

I have been doing some simulations in R, to try to get a feel for it.

Stepping back, my real question is, if I have a small sample size, what
is the best way to "decide" whether or not it was generated by a
uniform distribution? Is K-S a good choice? Some references suggest
that Anderson-Darling is better but I don't know if that applies in the
case of the uniform distribution. Are there other tests I should be
considering?

Where do the data come from, and what are you doing with them?
Different tests usually are each 'better' at something-or-other.

Another question might be, What sort of *deviation* from
uniform will bother you? Are you doing hundreds of tests? (Why?)
Is the problem when all the numbers are in a narrow range? near
one particular end?

How many digits of accuracy are given, and is "continuous" important?
For instance, if two values are the same, is that devastating for
your purposes?


--
Rich Ulrich, wpilib@xxxxxxxx
http://www.pitt.edu/~wpilib/index.html
.



Relevant Pages

  • Re: Test for uniform distribution for small sample size
    ... high confidence that the unit interval is not appropriate ... but still uniform? ... and the hypothesized CDF. ... case of the uniform distribution. ...
    (sci.stat.math)
  • Re: Test for uniform distribution for small sample size
    ... high confidence that the unit interval is not appropriate ... and the hypothesized CDF. ... I have been doing some simulations in R, to try to get a feel for it. ... case of the uniform distribution. ...
    (sci.stat.math)
  • Re: Inverse of Empirical Cumulative Distribution F
    ... I think of it like a probability (e.g. the number of the y-axes of my EmpiricalCDF), and I want to know the corresponding return. ... If I understand you correctly, what you are doing is generating multivariate uniform vectors from a copula, and then transforming them to returns, using the inverse CDF of some estimated univariate marginal distributions. ... Empirical CDF and a GPD (Generalised Pareto Distribution) to the ...
    (comp.soft-sys.matlab)
  • Re: Detecting linear transformations of a uniform distribution.
    ... uniform distribution =constant over some large ... transformation, how confident would we be of our estimate ... Herman Rubin, Department of Statistics, Purdue University ...
    (sci.stat.math)
  • Re: what is probability to create two equal hashes for md5 algorithm
    ... Uniform distribution among the output set is not enough for cryptographic ... it have to do with the collision probabilities in the case described ... In fact the precise opposite would be true of a good hash ...
    (sci.crypt)