Re: Goodness of fitting of a distribution




Herman Rubin wrote:
In article <1163112757.733760.179870@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>,
Reef Fish <large_nassua_grouper@xxxxxxxxx> wrote:

Herman Rubin wrote:
In article <1162867197.824670.102540@xxxxxxxxxxxxxxxxxxxxxxxxxxx>,
Reef Fish <large_nassua_grouper@xxxxxxxxx> wrote:

Beliavsky wrote:
nelson wrote:
hi all!
i have done some fitting test of a dataset. I do quantile quantile
plot that points out that the best distribution that fit my data is a
linear combination of a weibull and a normal distribution. How can i
have a teorical test that can confirm it? People that work with me
wants to see numbers, not only QQ plots. And they don't like sum of
square error...

You can use the Kolmogorov-Smirnov test of goodness-of-fit

Kolmogorov-Smirnov statistics is NOT a "goodness of fit" statistic.

It is the maximum order statistic between a theoretical cdf and
an empirical cdf. It is a statistic sometimes used to measure
the DEPARTURE from a given cdf, rather than a "goodness of
fit".

I do not know of any "goodness of fit" test which is
not a "badness of fit" test.

Then perhaps you don't know as many tests as you think, and
you also over-value the K-S stat which examines ONE value
(the max) in the difference between two cdfs.

This does not make it a bad test. Look at my paper in
the last Berkeley Symposium.

I don't need to read your Berkeley Symposium to know that the K-S
is a bad test best for DATA ANALYSTS like myself, for the reason
stated below.

It is a TERRIBLE measure of "goodness of fit" because it looks
at only the point of MAXIMUM discrepancy.

Terrible? Very definitely NOT. In a given situation,
there may well be better tests, but it has comparable
power to parametric tests, and it is a universal test.

It is the chi-squared test with many classes which has
little power, and the combination of local discrepancies
with the same direction adds greatly to the power. The
maximum takes advantage of this.

You found one that is WORSE than the K-S test. :-)

And even THAT is not strictly worse except against LONG-
tailed distirubtions, where a Chi-square goodness of fit
test necessarily lumps the information in the TAILS that is
most telling into the end bins.

No, it is the large number of bins which reduces the
power. Also, the chi-squared test ignores the order
of the cells; that the larger cells are close together
generally provides more information than the individual
cell differences. I have found far-out p values in
DISCRETE problems, where K-S is conservative, and
chi-squared finds nothing. These were two-sample
tests, but the principle is the same.

Those are two separate issues. The chi-square is poor
against long-tailed distributions because of lumping tail
observations into the end bin.

It's not bad at all for testing U(0,1) distribution because
binning is not a problem since all bins are uniformly
distributed.

Why should the order of the cells matter for testing the
uniformity of a U(0,1) distribution? For testing uniform
random numbers, I think the Chi-square test is on
everyone's list of tests, while the K-S is on none, to the
best of my recollection.

If scale is an important concern, the Kuiper test
is better, and this only looks at the deviation at
TWO points, but the two points are not fixed, but
are the two extremes. This is the one I recommend
for "bump hunting".

I think your mind wandered off, Herman. I was talking
about the chi-square test being better than the K-S for
testing random numbers on U(0,1).

For short-tail distribution, such as the Uniform, the Chi-
square goodness if fit ain't too bad. In fact, is is used as
ONE of the tests for pseudorandom number generators.
To test the uniformity of the distribution in ALL bins.

A more powerful test would be to look at the maximum
deviation, which is likely to be more powerful than the
chi-squared, or even the sum of the absolute deviations.
In this case, there is no ordering of the bins.

More power test against what? The maximum deviation
cannot possibly be more powerful than the uniform chi-
square which takes ALL deviations of bins into consideration
rather than just the bin with the maximum deviation.

For all other distribution, no Data Analyst worth his salt would
even think about K-S, for the reason of the effectiveness of
Q-Q plot. The more you understand or think about HOW to
use (or examine) Q-Q plots for departure, the LESS you'll
be impressed by the Kolmogorov. Of course, mathematical
statisticians have their own way of mathemtistry to think up
reasons why K-S test is any good at all!

NO mathematical statistician has EVER thought of, or be
able to capture the "small systematic departures" that eludes
the K-S statistic every time. We. the Data Analysis NEVER
miss that kind of systematic departures. That is why the Q-Q
visual test has NO analytic competitor that is even in the same
league.

-- Reef Fish Bob.


--
This address is for information only. I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Department of Statistics, Purdue University
hrubin@xxxxxxxxxxxxxxx Phone: (765)494-6054 FAX: (765)494-0558

.



Relevant Pages

  • Re: Goodness of fitting of a distribution
    ... distribution being tested. ... The K-S test has positive efficiency ... which the chi-squared test has decent power are ... To test the uniformity of the distribution in ALL bins. ...
    (sci.stat.math)
  • Re: Goodness of fitting of a distribution
    ... plot that points out that the best distribution that fit my data is a ... linear combination of a weibull and a normal distribution. ... It is the chi-squared test with many classes which has ... it is the large number of bins which reduces the ...
    (sci.stat.math)
  • Re: Goodness of fitting of a distribution
    ... plot that points out that the best distribution that fit my data is a ... linear combination of a weibull and a normal distribution. ... where a Chi-square goodness of fit ... it is the large number of bins which reduces the ...
    (sci.stat.math)
  • Re: Probit analysis
    ... In a log-likelihood ratio test, you would fit the probit model ... compared with a chi-square distribution. ...
    (sci.stat.math)
  • Re: How to test a distribution for uniformity?
    ... > observations occured is roughly uniform. ... > distribution of observation times differs significantly from ... I am using bins (those are my 45 minute ... I wonder is there such a thing as a chi-square test which is adjusted to ...
    (sci.stat.math)