Computing Chi-square stats on Card deck
- From: "Bill Taylor" <w.taylor@xxxxxxxxxxxxxxxxxxxxx>
- Date: 26 Jan 2006 21:01:07 -0800
No-one seems to have answered this, so I'll have a quick flick.
You might have got a bigger response in sci.math.stat, but anyway:
-
>>>>
Perform 100,000 shuffles and, and check the sequence of cards
in each. A table is created as follows :
Columns correspond to the position of the card from the top of deck,
Rows correspond to Cards. The frequency of occurrence of each card
in each position is tabulated as :
Position -> 1 2 3 4 ...... 52
A-Clubs 1992 1925 1966 1924 ..... 1941
2-Clubs 1918 1916 ...........................
3-Clubs 1849 1973 ...........................
.......
K-Spades 1912 1974 ..........................
Expected value of each cell = 100,000/52
Chi-square is calculated for each cell as
(cellvalue-exp.value)^2 / exp. value
The values in all cells are totalled, and the Chi square
statistic is calculated on the total with degrees of freedom
as 51 * 51 (51 being No. of rows-1, and 51 being No. of columns -1)
<<<<
-
All this is correct. The key thing about the DoF being 51*51
is that that is the number you (or the gods) can freely choose,
before the others all then become fixed by the constraints of
the problem. In this case, those constraints are that all rows
and all columns must add up to 100,000.
> Now my doubt is :
> On row 1, we're creating the statistics for a single card - namely
> the Ace of clubs. When we come to the second column of that row,
> the first card has already been dealt, and we have only 51 cards left.
This is true, but only if you persist in treating the 1st row
as a bunch of independent RVs, and the 2nd row as being a bunch
of CONDITIONAL RV's, conditional on the values of the first, etc.
But this is artificially singling out one row; (and in fact all
rows according to your permutation order, 1,2,3,4... in this case.)
There is no need to do this. You CAN do it but it just makes
the whole thing hopelessly intractable. Best to treat all the rows
as "exchangeable", (not quite independent), so they are all on
the same footing. This is standard. The physical order you
did things in is quite irrelevant to anything statistical.
> So shouldn't the expectancy be (100000-1992)/51 instead of 10000/52 ?
So, no.
They are excahngeable, so all have the same distribution, mean etc.
> The reason this doubt arose is because I've seen the same analysis
> being performed by using only "n" cards drawn from the top of the deck
> ("n" being typically a small number like say 7 cards),
> and when we do that the degrees of freedom assumed is (n * 51).
Yes, the key thing is about the degrees of freedom is as I said above.
51*n is correct here.
> [as] the frequency values in the last column are completely predictable
Yes; but the figures for ANY column are fixed, GIVEN all the others.
It sounds like you want to "leave out" the last column from your
calculations, but again, this would be giving unwarranted significance
to some particular column. You must not do this. It seems funny,
I know, to be adding up 52 of something when only 51 are "really there"
in some sense. But it is in fact the correct thing to do, though
exactly WHY it is correct is not well explained (i.e. it is usually
glossed over!) in lecture courses and even text books.
One can get a glimpse of why, without TOO much work, if you work
out the precise theory for the case of n=2. There, though there is
only one df, one nevertheless has to add up BOTH figures (even though
they are effectively the same!), to get the exactly right answer.
HTH.
-------------------------------------------------------------------------
Bill Taylor W.Taylor@xxxxxxxxxxxxxxxxxxxxx
-------------------------------------------------------------------------
Yes it may be easy to lie with statistics,
but it's easier still to lie without them!
-------------------------------------------------------------------------
.
- Prev by Date: What is your opinion regarding Descarte's vortices?
- Next by Date: Re: Cantorian pseudomathematics
- Previous by thread: What is your opinion regarding Descarte's vortices?
- Next by thread: how to resolve vector into x-component.
- Index(es):
Relevant Pages
|