Re: Finding Statistically Significant Rules



On May 11, 10:43 am, Ray Koopman <koop...@xxxxxx> wrote:
On May 11, 12:11 am, hgwelec <hgwe...@xxxxxxxxx> wrote:



Dear All,

I have used a C4.5 decision tree to make an analysis. The analysis
(classification) is about finding the common characteristics of "good"
clients.

Say for example that out of the decision tree the following "rule" is
shown:

IF AGE >32
AND NUM_OF_CHILDREN > 2
AND CLIENT_PROFESSION="DOCTOR"
AND GENDER="MALE"
THEN
CLIENT="GOOD"

Now, the above rule has 85% accuracy and a 25% coverage on the
dataset.
The dataset consists of 700 cases

What i would have to do in order to assess whether this fact is NOT
atrtributed to pure chance?
A chi-square test clearly cannot be used since AGE and NUM_OF_CHILDREN
are not categorical variables.

Any Help greatly appreciated

Hgwelec

Reanalyze the data with the target variable permuted randomly.
Do this a few thousand times, keeping track of the accuracy of
the classifications. Look at the distribution of accuracies.
Where does your 85% figure stand in that distribution?

Hi Ray and thanks for your reply.


If i understood well, basically you say to do some sort of cross-
validations and keep track of how the 85% accuracy changes in each
fold. Of course i am not a statistician but this seems to me something
like an "empirical" rule.

With a chi-square test you are able to quantify statistical
significance and present your findings -say on a scientific paper-
but how can i quantify the significance in such a way you described?.


Again, sorry if i am totally mistaken about this



Thanks,


Hgwelec

.



Relevant Pages

  • Re: Finding Statistically Significant Rules
    ... I have used a C4.5 decision tree to make an analysis. ... The dataset consists of 700 cases ... Look at the distribution of accuracies. ...
    (sci.stat.edu)
  • Re: Help with multivariate distribution
    ... distribution which you work on. ... beta, X has to be between 0 and 1, and of those, how do you ... has NO RELEVANCE to the subject of "classification" nor do they ... I may mix two metals to make Gold out of it as a result. ...
    (sci.stat.math)
  • Re: Help with multivariate distribution
    ... distribution which you work on. ... beta, X has to be between 0 and 1, and of those, how do you ... has NO RELEVANCE to the subject of "classification" nor do they ... I may mix two metals to make Gold out of it as a result. ...
    (sci.stat.math)
  • Re: Distinguishing Distributions
    ... I think that for any distribution at all (including ... tempting to try and calculate an expected likelihood ratio. ... likelihood function for the new data according to the pdfs, ... classification achieves near 100% accuracy when dealing with very ...
    (sci.stat.math)