Re: clustering points up to a sample size
From: Mitch Harris (harrisq_at_tcs.inf.tu-dresden.de)
Date: 02/22/05
- Next message: Larry Hammick: "Re: Is this set compact?"
- Previous message: Mitch Harris: "Re: What is a Closed set of Boolean Functions ?"
- In reply to: xyzzy12_at_hotmail.com: "clustering points up to a sample size"
- Next in thread: Mitch Harris: "Re: clustering points up to a sample size"
- Reply: Mitch Harris: "Re: clustering points up to a sample size"
- Messages sorted by: [ date ] [ thread ]
Date: Tue, 22 Feb 2005 11:18:45 +0100
xyzzy12@hotmail.com wrote:
> I have a simple need. Given a large amount of (x,y) tuples, I want to
> cluster them into sample sizes of at least a certain size S. The
> distance measurement will be Euclidean Distance.
>
> Essentially, the algorithm is this:
> Given a set of (x,y) tuples make each one a cluster of size one.
> repeat
> for each cluster whose size is < S
> find the closest cluster by measuring the distance between
> their centroids
> merge with the closest cluster and adjust centroid
> until all clusters have a size >= S
>
> What is this algorithm called?
formally it is called hierarchical, agglomerative clustering
> I've coded it in Java, but its
> basically O(n^3) speed. Is there a faster way?
Given n points, you're computing O(n^2) distances, then choosing a
closest pair (for some special choice of "closest") by scanning, then
recomputing distances, n-1 times. That's O(n^3).
Being clever you can keep the distances in a priority queue O(n^2 log
n) to insert them all, O(log n) to extract the "closest", then O(n^2
log n) to update distances (priorities). That's O(n^2 log n)
This does not depend on your cluster criterion (you can use centroid,
or weighted centroid (depends on the number of points in the cluster),
or nearest neighbor (updating might be slower here) or farthest
neighbor or whatever with this neighbor).
But you may be able to do better in specific instances. If you're
doing nearest neighbor ("simple (or also called single) linkage"),
then your hierarchy tree is a minimum spanning tree for which you can
use off the shelf algorithms for O(E log V) or O(n^2 log n) (same as
above).
-- Mitch Harris (remove q to reply)
- Next message: Larry Hammick: "Re: Is this set compact?"
- Previous message: Mitch Harris: "Re: What is a Closed set of Boolean Functions ?"
- In reply to: xyzzy12_at_hotmail.com: "clustering points up to a sample size"
- Next in thread: Mitch Harris: "Re: clustering points up to a sample size"
- Reply: Mitch Harris: "Re: clustering points up to a sample size"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|