Re: Clustering Software



On 10 Jun 2006 14:42:32 -0700, "Reef Fish"
<Large_Nassau_Grouper@xxxxxxxxx> wrote:


Richard Wright wrote:
Are you using 'primitive' as a description or a condemnation?

It was certainly a concise and accurate description.

You can imagine the rest anyway you iike.


In the evolutionary sense things can be primitive yet satisfactory.

just as there are things primitive that are no longer satisfactory.

The human hand is about as primitive a forelimb as you can find


you can still rub a stick to make a fire (when you have no other way)
or use a piece of rock for a cutting tool.

Good one, Bob. You snip part of what I said after that quote, and then
make the very same point in an attempt to rebut my truncated
statement!

What I said was: "The human hand is about as primitive a forelimb as
you can find among mammals - primitive in the sense that it is close
to the ancestral vertebrate form and can perform diverse tasks -
picking up food, flying aircraft, and carrying out unmentionable
acts."



By contrast the horse's hoof is about as advanced/sophisticated as you
can get, but it can do almost nothing except run and kick.

What's so advanced/sophistated about a horse's hoof?


It is advanced because it has evolved far away from the ancestral
form. It is sophisticated because it is is highly adapted for its
(limited) function. The hands of members of this newsgroup are
primitive, but their brains are advanced.

It may or may not be desirable for a property to be primitive. It may
or may not be desirable for a property to be advanced. That evaluation
depends on adaptedness under constantly changing circumstances. BTW,
the customary evolutionary terms are 'primitive' and 'derived', but I
used the more intuitive word advanced because I suspected these
matters were out of your field.


I thought a horse's mouth is much more advanced and sophisticated
because EVERYONE wants to hear from the horse's mouth than
from some other source.


I am out of my field here,

You could've fooled me had you not been so honest to confess.


but am wondering this. Should I assume that
primitive equates with uselessness in clustering algorithms? Has the
average linkage algorithm been shown to be unsatisfactory in
applications?

Sorry you missed my previous posts on the subject, in which I had
challenged some poster(s) to name ONE single discovery in the
past decades that could be attributed to the use of the clustering
algorithms, or which the min, ave, and max are the most commonly
used ones and the CRUDEST ones.

Back to statistics. Sorry, I did miss the earlier posts. But I am
still none the wiser in relation to my particular question. Is there
a consensus that average linkage clustering produces worse results
(empirically) than more advanced methods?


Please don't come back with a new thesis that CRUDE oil is such
an essential commodity that it's ruling the present world because
the US economy is going down the drain because of rising CRUDE
oil prices ... that "crude" is a desirable property just as "primitive"
is.

That's your crazy idea, not mine.


-- Bob.



Maybe I have misunderstood things and you are just
having a go at the ironical self-effacement that I saw in John's claim
about one-time sophistication.


On 10 Jun 2006 11:28:17 -0700, "Reef Fish"
<Large_Nassau_Grouper@xxxxxxxxx> wrote:


John Uebersax wrote:
My program, CLUSBAS, is now uploaded and can be retrieved here:

http://ourworld.compuserve.com/homepages/jsuebersax/clusbas.zip

Details:

1. Average-linkage, hierarchical cluster analysis
2. Interactive, very easy to use (two commands!)
3. User supplies similarity/dissimilarity matrix
4. Runs in DOS window (should be no problem)
5. Limited to 100 objects (designed for variable/item clustering),
but this can be increased.
6. Source code included (two versions: fortran and QuickBasic)

Note: this was once a very sophisticated mainframe program,

You're joking of course.

The "average" linkage has the same algorithm as the "minimum"
and the "maximum" linkage (the only difference is updating the
new cluster value by "average" rather than "min" or "max"), and
is the most primitive of ALL clustering algorithms.

1 2 3 4 (1,3) 2 4
(1,2,3) 4
1.00 .50 .76 .50 (1,3) 1 .585 .5 (1,2,3) 1 .5
.50 1.00 .67 .50 2 1 .5 ---> 4 1
.76 .67 1.00 .50 4 1
.50 .50 .50 1.00

which was what was done by your program on a similarity matrix of
correlations (since the self similarity is 1,00).

If the algorithm had been the MIN, the value corresponding to
.585 which is AVE (..5,.76) would have been MIN(.5, .76) = .5
and the MAX algorithm would have yielded MAX(.5,.76) = .76
and the similarity matrix reduces to a size one less, and the
identical algorithm continues until two groups are merged to 1.

Can't find a LESS sophiscated, or more simple-minded algorithm
than those three!

Even Afonso should be able program ALL THREE algorithms
in one (by offering a choice 1 for min, 2 for ave and 3 for max)
in about 15 minutes or less. :-)

-- Bob.


with fancy
CalComp plots and the whole works. I haven't fully ported it to PCs
because I thought something better would come along--but it really
hasn't.

However, if there's interest I can restore more of the original program
features in the PC version.

Sample input:

1.00 .50 .76 .50
.50 1.00 .67 .50
.76 .67 1.00 .50
.50 .50 .50 1.00

Sample output:

GROUP 1 IS JOINED BY GROUP 3. N IS 2 ITER = 1 SIM =
0.760
GROUP 1 IS JOINED BY GROUP 2. N IS 3 ITER = 2 SIM =
0.585
GROUP 1 IS JOINED BY GROUP 4. N IS 4 ITER = 3 SIM =
0.500

1 3 2 4
* * * *
1 ***** * *
* * *
2 ******* *
* *
3 ********

Hope this helps.

John Uebersax PhD

.



Relevant Pages

  • Re: Clustering Software
    ... past decades that could be attributed to the use of the clustering ... a consensus that average linkage clustering produces worse results ... If the algorithm had been the MIN, ... and the similarity matrix reduces to a size one less, ...
    (sci.stat.math)
  • Re: Clustering Software
    ... In the evolutionary sense things can be primitive yet satisfactory. ... is the most primitive of ALL clustering algorithms. ... If the algorithm had been the MIN, ... and the similarity matrix reduces to a size one less, ...
    (sci.stat.math)
  • Re: question about clustering algorithm used by microsoft
    ... This newsgroups covers failover clustering and not datamining clustering ... Please reply only to the newsgroups. ... > algorithm used by microsoft for data mining in ole db. ...
    (microsoft.public.sqlserver.clustering)
  • Re: Clustering Software
    ... past decades that could be attributed to the use of the clustering ... If the algorithm had been the MIN, ... and the similarity matrix reduces to a size one less, ... Sample input: ...
    (sci.stat.math)
  • Re: Clustering Software
    ... Limited to 100 objects (designed for variable/item clustering), ... If the algorithm had been the MIN, ... and the similarity matrix reduces to a size one less, ... John Uebersax PhD ...
    (sci.stat.math)