Re: Parametric Bootstrapped Kolmogorov-Smirnov GoF test



Shiazy wrote:
Thank you David!

I have some questions about your reply ... (see below)

On May 30, 11:13 am, "David Jones" <dajx...@xxxxxxxxx> wrote:
shi...@xxxxxxxxx wrote:
Hi,

I want to perform a One-Sample parametric bootstrapped Kolmogorov-
Smirnov GoF test. Just after read some tutorial found on internet,
I've decided to do this:

This seems wrong . However you approach this, you need to be doing
something that will replicate the effects of fitting the parameters



Why is this wrong? I have data from which I estimated parameters for a
certain distribution. So to verify the goodness of fit between the
initial data values and the theoretical distribution I've read I have
to perform a parametric GoF. Is it?


The basic form of the GoF tests (in particular, the critical values against which you judge the sample GoF
values) assume that you are judging the sample data against a pre-specified, known distribution (with no fitted
parameters). For a given sample, the GoF statistic evaluated for a fitted distribution is very likely to be
smaller than the statistic evaluated for the "true" distribution from which the sample actaully comes. This is
because the act of fitting a distribution is such as to choose a distribution that is close to the sample data
in some sense. Thus using a fitted distribution in the GOF test might be regarded as introducing a "bias". The
simulations need to be conducted in such a way as to include essentially the same "bias" effect.

Looked at another way, you have a certain test statistic derived in a particular way from the sample data. You
want to do simulations in which simulated values of the test statistic are derived in that same particular way
from each set of simulated data. This "particular way" needs to include all uses made of the sample data in
deriving the sample statistc.




1. Given sample X_i, i=1,...,n, estimate parameters theta
(theta.hat) for a certain theoretical distribution F(x; theta)
obtaining F(x; theta_hat)
2. Calculate K-S statistic from sample X and fi (say ks.stat) (one-
sample K-S test)
3. Execute B iterations (B=999); for each iteration i:
3.1 sample from F(x; theta_hat) a sample Y_j, j=1,...n

3.2_repl_ Fit the distribution to Y_j, j=1,...n, giving parameters
theta_hat_sample

3.2_repl_2 Calculate K-S statistic from sample Y and fitted
parameters theta_hat_sample (one-
sample K-S test) giving say ks.bstats[i]

Let me verify if I'm understanding what you suggested. Suppose the
theoretical distribution is, say, a Weibull. Let X an array containing
the X_i sample.

--- [R-code] ---

n <- length(X); # the sample size

# Let shape.hat and scale.hat the two-Weibull parameter fit from X

# Do a one-sample K-S test between X and Weibull
ks.stat <- ks.test( X, "pweibull", shape.hat, scale.hat );

# Perform bootstrap test
for ( i in 1:(B-1) )
{
Y <- rweibull( n, shape.hat, scale.hat ); # draw a sample from
Weibull (step 3.1)

# Let shape.hat.sample and scale.hat.sample the two-Weibull
parameters fit from Y

# Do a one-sample K-S test between Y and Weibull with
shape.hat.sample and scale.hat.sample parameters
ks.stats[i] <- ks.test( Y, "pweibull", shape.hat.sample,
scale.hat.sample );
}

p.value <- (1 + sum( ks.stats >= ks.stat )) / B;
--- [/R-code] ---

I think this should be
p.value <- sum( ks.stats >= ks.stat ) /( B+1);

however you might find it interesting to construct a histogram of the values in ks.stats and add the point
ks.stat as a special symbol for comparison.


(I hope you know R code)

But doing so, is there a risk to add a "double" bias:
1. one bias related to the fact I perform the test against the same
data set (X_i) from which I've estimated the distribution parameter
(this should be mitigated doing a long, say 1000 iteration,
bootstrap).

This may be what you describe: there is a possible bias arising from the fact that the true distribution of the
test statistic depends on the (unknown) parameters of the distribution from which the data actually arises. This
effect does not diminish as the number of bootstrap samples increases. It does decrease as the size of the
original sample increases. However, for same parameters there is sometimes no effect at all, for example if the
distribution and fitting procedure are location and/or scale invariant, then the distibution of the test
statistic does not depend on these parameters. In your case, it is likely that you have a fitting procedure
which is scale invariant, and thus the true distribution of the GoF test statistic would depending only on the
shape parameter. You could explore this by evaluating the distribution of the test statistic (still accounting
for the effect of fitting) for a range of known sclae and shpare parameters.



2. another bias for estimating the parameters from Y_i, which is a
sample drawn from the same distribution
I think point #2 will always give an optimistic result (i.e. the null
hypothesis can't be rejected)

The "bias" outlined above would result in the test-statistic being "optimistic", in this sense, if judged
against a distribution for the test statistic which did not properly account for the fact that you are comparing
a sample with a distribution function that has been fiited to that sample.


If this is the way the has to be performed, can you give a biblio
reference ... I'd like to better understang these techniques.

Stephens 1974 JASA, 69, 730-737
Stephens 1976 Ann.Statist, 4, 357-369
Stephens 1977 Biometrika, 64, 583-588

Section 5.5 of "Empirical Processes with Applications to Statistics" by Shorack & Wellner (Wiley,1986)


David Jones


.



Relevant Pages

  • p-values for Anderson-Darling GoF
    ... I want to perform a GoF test on sample data against several ... parameters the these theoretical distribution are ... I've found only AD test for normality. ... statistician so do not write too technically)?? ...
    (sci.stat.math)
  • Re: Parametric Bootstrapped Kolmogorov-Smirnov GoF test
    ... certain distribution. ... values) assume that you are judging the sample data against a pre-specified, ... because the act of fitting a distribution is such as to choose a distribution that is close to the sample data ... Thus using a fitted distribution in the GOF test might be regarded as introducing a "bias". ...
    (sci.stat.edu)
  • Anderson-Darling GoF test
    ... I want to perform a GoF test on sample data against several ... parameters the these theoretical distribution are ... I've found only AD test for normality. ... statistician so do not write too technically)?? ...
    (sci.stat.edu)
  • Re: p-values for Anderson-Darling GoF
    ... distribution (like Extreme Value, Phase Type, Pareto, ...). ... influence the way of how conducting the GoF test! ... distribution") where it seems I can consider the ECDF ... ANDERSON DARLING LOGNORMAL TEST Y ...
    (sci.stat.math)
  • Re: Choose k random lines from file
    ... > Suppose our RNG ... > that number without a bias. ... > unbiased distribution either. ... distribution for the sum. ...
    (comp.programming)