Re: Finding Statistically Significant Rules
- From: hgwelec <hgwelec@xxxxxxxxx>
- Date: 11 May 2007 06:33:08 -0700
On May 11, 1:22 pm, Ray Koopman <koop...@xxxxxx> wrote:
On May 11, 1:03 am, hgwelec <hgwe...@xxxxxxxxx> wrote:
Hi Ray and thanks for your reply.
If i understood well, basically you say to do some sort of cross-
validations and keep track of how the 85% accuracy changes in each
fold. Of course i am not a statistician but this seems to me something
like an "empirical" rule.
With a chi-square test you are able to quantify statistical
significance and present your findings -say on a scientific paper-
but how can i quantify the significance in such a way you described?.
Again, sorry if i am totally mistaken about this
Thanks,
Hgwelec
Have I misunderstood something? You developed a classifier, that
turned out to be 85% accurate in its development sample. If you
reanalyze your data R = 5000 or so times, each time following the
same procedure you used initially, but each time randomly permuting
the N = 700 values of the client variable (i.e., randomly reassigning
the N observed values to different clients), you will get R different
accuracies, one for each redeveloped classifier of the randomized
data. Each time you use all N cases. There is no hold-out sample.
OK so far?
The "significance" of your classifier's accuracy -- the quantity that
corresponds to the p-value from a statistic such as chi-square --
is the proportion of the R accuracies that equal or exceed the
accuracy of the classifier that was developed on the unpermuted data.
This is known as a permutation test and is perfectly respectable
scientifically.
Hi again Ray,
As already discussed, i am not a statistician. What you said made it
crystal clear but unfortunately, 5000 repeats of this procedure cannot
be performed.
I am thinking of discretizing AGE and NUM_OF_CHILDREN and then
performing a chi-square test.
Thanks,
Hgwelec
.
- References:
- Finding Statistically Significant Rules
- From: hgwelec
- Re: Finding Statistically Significant Rules
- From: Ray Koopman
- Re: Finding Statistically Significant Rules
- From: hgwelec
- Re: Finding Statistically Significant Rules
- From: Ray Koopman
- Finding Statistically Significant Rules
- Prev by Date: Re: Finding Statistically Significant Rules
- Next by Date: Re: 'demand and supply of pork' example from Maddala 2001 introduction to econometrics
- Previous by thread: Re: Finding Statistically Significant Rules
- Index(es):