Re: clustering
- From: "Reef Fish" <Large_Nassau_Grouper@xxxxxxxxx>
- Date: 12 Dec 2005 08:41:58 -0800
Art Kendall wrote:
> To explicate and put in some other words what I said and what I read
> into the responses of others.
>
> AJK: Don't put too much faith in any solution.
No argument about that, but that's NOT a faithful paraphrase of what
you said about k-means clustering.
> AJK & John Uebersax: In an agglomerative hierarchical solution in
> going from a five cluster step to a 4 cluster step, 3 clusters remain
> the same and two are put together. You commonly get the whole tree
> starting with 1 cases per cluster combining clusters until you only have
> one. (In divisive approaches you start with 1 cluster and split in a
> series of steps until each cluster is 1 case.) In a non-hierarchical
> approach in going from a 5 cluster solution to a 4 cluster solution any
> of the cases can go into any of the clusters.
>
> John Uebersax: The tree is the goal in hierarchical clustering not a
> single slice.
All of the above characterizations are FAULTY, both in form and in
substance, relative to "clustering".
1. A "tree diagram" or a "dendrogram" is nothing more than a
graphical representation of a history of hierarchical STEPS in
a clustering ALGORITHM -- it does NOT even imply that there
is ANY clustering phenomenon.
2. Simulate any group of completely RANDOM points inside a
circular area and cluster them by ANY of the hierarchical
clustering algorithms in SAS, SPSS, or whatever. The
"history" of the clustering will be representable by a tree
diagram even though the configuration is nothing but a
"random scatter" or holes put through the wall by a shotgun
blast.
3. The SERIOUS side of (1) and (2) above is the mistaken
notion and interpretation (as given by Uebersax in his post).
which is well-known to be fallacious in the legitimate
literature in cluster analysis.
> Reef Fish: Don't put too much faith in any solution. There are kinds of
> clusters that are not easily detected by widely used algorithms.
The first sentence is a reasonable summary of my decades of
research and publications on the subject of "clustering", if you
delete the word "too". That is not a hyperbole at all!
I gave up my aspiration about bringing some order and insight
into the subject of "cluster analysis" about a DECADE after I
complete my doctoral dissertation on the subject, "Cluster
Analysis" (1970), and spent years of serious attempts at
cracking the inpenetrable shell of the problem that any
4 year-old can understand what the problem IS (finding
groups of objects), but none of the best minds in the field
has put a dent on the solution most 4 year olds can solve:
(as in the example I gave -- suppose you drop a bunch of
candies on the floor and they form the two groups (or
some other equally recognizable groups) that have stumped
ALL existing clustering algorithms!
I gave up all hopes of seeing order or SUBSTANCE in the
applications of "cluster analysis" after I was the Program
Chairman of the First meeting of the International Federation
of Classification Societies in 1989, consisting of members
of Classification Societies in the world at the time.
http://www.classification-society.org/
Come to think of it, change it to
"Don't put ANY faith in any solution" until you have carefully
scrutinized the clustering "solution" by external validation,
independent of the clustering process/algorithm itself.
In Art' second sentence, it is imperative to drop the word "easily".
They are simply NOT detectable, partly because of the lack
of a "definition" of what constitutes a "cluster", and can be
empirically proven that NONE of the existing programs
(I know at least several hundred DIFFERENT clustering
algorithms) will find the otherwise "obvious" clusters!
-- Bob.
>
> Art
> Art@xxxxxxxxxxxxx
> Social Research Consultants
>
>
>
> John Uebersax wrote:
>
> > (This repeats some of what Art said, but as I have already composed
> > this message I'll go ahead and post it anyway.)
> >
> > Too add to my previous reply, hierarchical cluster analysis (HCA) gives
> > a series of hierarchically related partitionings. In a sense, the
> > solution is the entire structure of these partitionings, as, say,
> > expressed in a dendograph:
> >
> > 1 2 3 4 5
> > * * * * *
> > * * ***** *
> > * * * *
> > ***** * *
> > * * *
> > * *******
> > * *
> > ************
> >
> > This is more, and a different kind of information than non-HCA
> > produces. It shows the hierarchical structure of alternative taxonic
> > levels.
> >
> > For example, if objects are biological organisms, HCA produces
> > divisions that might correspond to family, genus, species, subspecies,
> > etc. in a single solution.
> >
> > --
> > John Uebersax PhD
> >
.
- References:
- clustering
- From: 2046
- Re: clustering
- From: John Uebersax
- Re: clustering
- From: Art Kendall
- clustering
- Prev by Date: Re: clustering
- Next by Date: Polynomial regression analysis
- Previous by thread: Re: clustering
- Next by thread: fourth moment of a normal variable
- Index(es):
Relevant Pages
|