using the chi-square goodness of fit test for large cell counts



Hi there! I'm trying to apply a chi-square goodness-of-fit test to
see if a distribution has changed significantly.

In particular, there are a certain number of events that occur in a
time interval in my system. I've recently changed the system in a way
that could, theoretically, have changed the distribution of events.

For the sake of argument, here are the distributions of events before
the system was changed (baseline), and the distribution after the
system was changed (experiment):

events: 1 2 3 4 5 6 7 8
baseline: 48366 11115 1088 126 25 4 1 1
experiment: 48595 10834 1073 125 40 15 3 1

If I squint, my intuition says, "there is no real difference between
these distributions."

But R's "chisq.test", disagrees with me:

> b = c(48366,11115,1088,126,25,4,1,1)
> x = c(48595,10834,1073,125,40,15,3,1)
> chisq.test(x, p=b, rescale.p=TRUE)

Chi-squared test for given probabilities

data: x
X-squared = 51.6607, df = 7, p-value = 6.81e-09
>

If I cheat, and rescale the counts, I get much more "believable" (to
me!) p-values that show the differences to be insignificant:

> chisq.test(x/10, p=b, rescale.p=TRUE)$p.value
[1] 0.6397052
> chisq.test(x/100, p=b, rescale.p=TRUE)$p.value
[1] 0.9993834
>

Is it inappropriate to use this test on "too much data"?

Thanks in advance for any advice...

chris
.



Relevant Pages

  • Re: Basic statistics question
    ... an extension of a binomial. ... fun calculation. ... binomial distribution. ...
    (sci.stat.math)
  • Re: Whats a good strategy for testing keywords for smart code editor?
    ... >> No real difference on the timing, ... Wouldn't surprise me if ... > deal on the distribution of 1st letters, ...
    (alt.comp.lang.borland-delphi)
  • Re: "Random Number Generation" Window?
    ... I never in any way abetted or endorsed the PRND in Excel, ... >> Mike - ... >> The appropriate parameters depend on the type of distribution you ... >> column contains probabilities associated with the value in that row. ...
    (microsoft.public.excel.programming)
  • Re: Confirmation of Shannons Mistake about Perfect Secrecy of One-time-pad
    ... you cannot use the distribution on K to determine ... c not fixed so the distribution of K is uniform ... Computing the conditional probabilities. ... (sum of probabilities of all events where M=0 and C=0)/ ...
    (sci.math)
  • Re: Calculus XOR Probability
    ... case where you have a uniform probability distribution over an infinite set of ... probabilities sum to 1, because the individual probabilities are considered ... distribution, you can say what the _expected value_ of a random ...
    (sci.math)