Re: Goodness of fitting of a distribution



In article <1163142366.664376.57650@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>,
Reef Fish <large_nassua_grouper@xxxxxxxxx> wrote:

Herman Rubin wrote:
In article <1163112757.733760.179870@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>,
Reef Fish <large_nassua_grouper@xxxxxxxxx> wrote:

Herman Rubin wrote:
In article <1162867197.824670.102540@xxxxxxxxxxxxxxxxxxxxxxxxxxx>,
Reef Fish <large_nassua_grouper@xxxxxxxxx> wrote:

Beliavsky wrote:
nelson wrote:

...............

Those are two separate issues. The chi-square is poor
against long-tailed distributions because of lumping tail
observations into the end bin.

The tails of the distribution do not matter if
the cells have the same probability under the
distribution being tested. Nobody has claimed
that the cells should be of equal length.

It's not bad at all for testing U(0,1) distribution because
binning is not a problem since all bins are uniformly
distributed.

It certainly is. The K-S test has positive efficiency
for testing uniform against beta(1+e, 1+e), or even
against any beta alternative, where the chi-squared
test with more cells as the size increases has zero
efficiency. If one fixes the number of cells, many
alternatives cannot be tested.

Why should the order of the cells matter for testing the
uniformity of a U(0,1) distribution? For testing uniform
random numbers, I think the Chi-square test is on
everyone's list of tests, while the K-S is on none, to the
best of my recollection.

I suspect that any user's alternative hypothesis to
uniform (0,1) is likely to be in the form of a
density. In this case, everything I have stated
about the K-S versus chi-squared holds. I would
be more likely to suggest the Kuiper test, which
is not used as much, as likely to be more powerful
against most alternatives, but both have good power.

For testing uniform random numbers, it can depend
on how the numbers are generated. But no matter
how, the chi-squared test has low power; there
are other tests, like the largest difference in
the probability of a bit, and even here, the
K-S test is not a bad test.

If scale is an important concern, the Kuiper test
is better, and this only looks at the deviation at
TWO points, but the two points are not fixed, but
are the two extremes. This is the one I recommend
for "bump hunting".

I think your mind wandered off, Herman. I was talking
about the chi-square test being better than the K-S for
testing random numbers on U(0,1).

As I said, it depends on how they are formed,
and in any case, this is orthogonal to most uses
of significance testing. The alternatives against
which the chi-squared test has decent power are
not the ones I consider most likely to be a problem
even for testing random numbers.

For the person who is testing whether data have
a fixed distribution, or whether two samples
can be assumed to have the same distribution,
K-S beats chi-squared in almost any situation
in which the chi-squared would be using more
than a few classes, and Kuiper is likely to be
even better in most cases.

For short-tail distribution, such as the Uniform, the Chi-
square goodness if fit ain't too bad. In fact, is is used as
ONE of the tests for pseudorandom number generators.
To test the uniformity of the distribution in ALL bins.

A more powerful test would be to look at the maximum
deviation, which is likely to be more powerful than the
chi-squared, or even the sum of the absolute deviations.
In this case, there is no ordering of the bins.

More power test against what? The maximum deviation
cannot possibly be more powerful than the uniform chi-
square which takes ALL deviations of bins into consideration
rather than just the bin with the maximum deviation.

Why? The chi-squared test is a good test against
an alternative which has a spherically symmetric
alternate distribution. Also, the effect of
non-uniformity is likely to be greater if the
deviation is concentrated in a few places than
spread out more or less uniformly.

For all other distribution, no Data Analyst worth his salt would
even think about K-S, for the reason of the effectiveness of
Q-Q plot. The more you understand or think about HOW to
use (or examine) Q-Q plots for departure, the LESS you'll
be impressed by the Kolmogorov. Of course, mathematical
statisticians have their own way of mathemtistry to think up
reasons why K-S test is any good at all!

NO mathematical statistician has EVER thought of, or be
able to capture the "small systematic departures" that eludes
the K-S statistic every time. We. the Data Analysis NEVER
miss that kind of systematic departures. That is why the Q-Q
visual test has NO analytic competitor that is even in the same
league.

Small systematic departures do not elude the K-S statistic;
they elude the chi-squared test. The small systematic
departures add up to a big departure in the cdf, and K-S,
or better Kuiper if it is in the middle, catches it.

The K-S, or Kuiper, is close to a visual Q-Q test anyhow.
What are you likely to see? A large deviation.

--
This address is for information only. I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Department of Statistics, Purdue University
hrubin@xxxxxxxxxxxxxxx Phone: (765)494-6054 FAX: (765)494-0558
.



Relevant Pages

  • Re: Goodness of fitting of a distribution
    ... plot that points out that the best distribution that fit my data is a ... linear combination of a weibull and a normal distribution. ... I don't need to read your Berkeley Symposium to know that the K-S ... it is the large number of bins which reduces the ...
    (sci.stat.math)
  • Re: Goodness of fitting of a distribution
    ... plot that points out that the best distribution that fit my data is a ... linear combination of a weibull and a normal distribution. ... It is the chi-squared test with many classes which has ... it is the large number of bins which reduces the ...
    (sci.stat.math)
  • Re: Goodness of fitting of a distribution
    ... different orientation toward the APPLICATION of statistics. ... theoretical distribution. ... It is the chi-squared test with many classes which has ... it is the large number of bins which reduces the ...
    (sci.stat.math)
  • Need help with this
    ... Prepare a plot that shows the distribution of noise in the raw DMA data ... use bins that range from 2.7815 - ... compute the Gaussian distribution for these data by using the ...
    (microsoft.public.excel.worksheet.functions)
  • Re: How to test a distribution for uniformity?
    ... > observations occured is roughly uniform. ... > distribution of observation times differs significantly from ... I am using bins (those are my 45 minute ... I wonder is there such a thing as a chi-square test which is adjusted to ...
    (sci.stat.math)

Loading