Re: assumption of Classification



On 26 Apr 2005 22:21:23 -0700, "Data Matter" <fungile@xxxxxxxxx>
wrote:

> He's asking if these procedures make distributional assumptions.
> Classification trees do not. Most clustering algorithms (k-means,
> single link, average link, etc.) do not. However, there is a class of
> clustering algorithms which assumes that each cluster is multivariate
> normal and then proceed to find the means and covariances of these
> clusters.

A classification tree that tries to break at every value
will not care whether the distance between 1 and 10 is
the same as the distance between 10 and 100 (or not).
(It is going to have a lot of opportunity to over-capitalize
on chance, so the N needs to be large.)

A classification tree that uses the mean will have some of
the same difficulty that "link" clustering does, if it wrongly
assumes that equal measures of intervals are equivalent.

>
> Nonetheless, normality is not the only assumption to be checked. Every
> method has its own list of assumptions and you should make sure that
> your data agree with the method you choose.
>

It's always good to check.

For methods of ordinary least squares, normality is not
as important as having decently behaved residuals - mainly,
absence of outliers, absence of pattern. And that behavior
matters for the *tests*, not for carrying out the fit.
[ ... ]

--
Rich Ulrich, wpilib@xxxxxxxx
http://www.pitt.edu/~wpilib/index.html
.



Relevant Pages

  • Re: assumption of Classification
    ... >> Richard Ulrich wrote: ... RF> clustering methods and there are metric clustering methods, ... RU> normality is not the only assumption to be checked. ...
    (sci.stat.edu)
  • Re: assumption of Classification
    ... Most clustering algorithms (k-means, ... > A classification tree that tries to break at every value ... normality is not the only assumption to be checked. ...
    (sci.stat.edu)
  • Re: Feature selection and K-means clustering
    ... Function 'sequentialfs' can be applied for both supervised learning ... unsupervised algorithms (such as clustering algorithms). ... Selecting features for clustering is not easy in general. ...
    (comp.soft-sys.matlab)
  • Re: assumption of Classification
    ... Classification trees do not. ... Most clustering algorithms (k-means, ... single link, average link, etc.) do not. ... normality is not the only assumption to be checked. ...
    (sci.stat.edu)
  • Re: 2 Questions: Manova and Selecting features
    ... Computes a Multivariate Analysis of Variance for equal or unequal ... Statistical power of a performed single-factor MANOVA. ... Many clustering algorithms are ... Thus, although clustering algorithms are ...
    (comp.soft-sys.matlab)