Re: clustering question
- From: Aniko <aniko123_57@xxxxxxxxx>
- Date: Mon, 12 May 2008 08:22:29 -0700 (PDT)
On May 11, 7:54 am, "ozgun.harmanci" <ozgun.harma...@xxxxxxxxx> wrote:
Hello,
We have been doing some data clustering to compare samples generated
by two different methods: A method is used to generate sample x_1,
then we cluster x_1 using diana in R package and determine the optimal
clustering scenario by maximizing calinsky harabasz index (as
calculated by R). diana is divisive analysis, which is a hierarchical
divisive clustering method. It computes a tree or dendrogram.
Our hypothesis is that one method should generate data which is less
scattered, meaning that cluster analysis should yield less number of
clusters.
However, when we do the clustering analysis on the generated samples,
we saw that there is no clear distinction between number of clusters.
But if I look at the tree's generated by diana then it is obvious to
me that the method which we expect to have less clusters has less
spread in the tree.
I am thinking that we should also use the variance of data in the
clusters in addition to number of clusters to compare the sampling
methods. I, however, could not find a theoretical way to do that.
Could you suggest me ideas, papers or books to follow up with this
problem?
I hope this makes sense.
Arif.
You need a good definition of "less scattered" and then compare based
on that definition. For example, would comparing the variance work?
Aniko
.
- References:
- clustering question
- From: ozgun.harmanci
- clustering question
- Prev by Date: Jargon translation needed!
- Next by Date: a quick introduction to..
- Previous by thread: clustering question
- Next by thread: Re: clustering question
- Index(es):
Relevant Pages
|