Re: k-means or not?
- From: "John Uebersax" <jsuebersax@xxxxxxxxx>
- Date: 18 Nov 2005 23:48:40 -0800
It would be more usual to analyze the data with hierarchical cluster
analysis (HCA) -- e.g., average-linkage or single-linkage). Here's
how:
1. From raw data, construct a matrix of co-occurrence frequencies, F,
between each pair of methods, where f(i,j) is the number of times
method i occurs with method j. This matrix is symmetrical.
2. From F, produce a proximity matrix, P, by adjusting each element
for the marginal row and column frequencies. That is, adjust f(i,j) by
the numbers of times method i and method j occur overall--f(i) and
f(j).
Note: This step is where the 'art' comes in. There are several ways
to make the adjustment and you need select one suitable for your goals.
Some examples are:
p(i,j) = f(i,j) / sqrt[f(i) * f(j)]
p(i,j) = f(i,j) / [f(i) + f(j)]
p(i,j) = f(i,j) / min[f(i), f(j)]
You might get ideas by checking the literature on cluster analysis
and/or multidimensional scaling of co-occurrence matrices.
3. Use HCA to analyze the matrix P. You need software that lets you
supply a proximity matrix rather than raw data. SAS will let you do
this. If you don't have too many methods (< 50) I also have a program
at StatLib for this.
If for some reason you strongly prefer k-means, there is a trick you
could use: First submit the P matrix to multidimensional scaling. The
scaling would convert the proximities into sets of coordinates for each
method. These coordinates could then be used in k-means clustering.
--
John Uebersax PhD
.
- References:
- k-means or not?
- From: akanksha . baid
- k-means or not?
- Prev by Date: k-means or not?
- Next by Date: Re: pilot study ADHD test online
- Previous by thread: k-means or not?
- Next by thread: Re: pilot study ADHD test online
- Index(es):