Re: NP-hardness of k-means clustering
From: Stas Busygin (busygin_at_a-teleport.com)
Date: 09/14/04
- Next message: mensanator_at_aol.com: "Re: Harassment of James Harris: A necessary rebuke to those who mocked his veterans status."
- Previous message: Acid Pooh: "Re: Uncountable sets in CZF?"
- In reply to: Kent Paul Dolan: "Re: NP-hardness of k-means clustering"
- Next in thread: Eray Ozkural exa: "Re: NP-hardness of k-means clustering"
- Messages sorted by: [ date ] [ thread ]
Date: Tue, 14 Sep 2004 18:54:33 -0400
Kent Paul Dolan wrote:
> "Stas Busygin" <busygin@a-teleport.com> wrote:
>
>>Kent Paul Dolan wrote:
>
>>>busygin@a-teleport.com (Stas Busygin) wrote:
>
>>>>Is there a published proof that the K-Centroid problem
>>>>is NP-hard?
>
>>>>[K-Centroid Problem]
>>>>Given n points in R^m, find k other points in R^m
>>>>(centroids) such that the sum of distances from each
>>>>of the given points to its closest centroid is minimal.
>
>>>I found this, indirect reference to a paper
>>>document, which to my vocabulary-deficient mind
>>>sounds like it may be what you are seeking:
>
>>>:- Clustering can be considered as an optimization
>>>:- problem in which an assignment of data vectors to
>>>:- clusters is desired, such that the sum of squared
>>>:- distances of the vectors to their cluster mean
>>>:- (centroid) is minimal.
>
>>>:- This combinatorial problem is called the minimum
>>>:- sum-of-squares clustering (MSSC) problem and is
>>>:- known to be NP-hard [1].
>
>>>:- 1. Brucker, P.: On the Complexity of Clustering
>>>:- Problems. Lecture Notes in Economics and
>>>:- Mathematical Systems 157 (1978) 45{54
>>
>>Very helpful paper to learn about advanced
>>heuristics for the K-centroids, thanks for the
>>ref! But the Brucker's paper from 1978 proves
>>NP-hardness of a bit different problem, where the
>>minimized objective is the sum of distances
>>between items within each cluster, and when the
>>distances are Eucledian, the complexity is not
>>decided. So, I still wonder is there a proof that
>>global optimization for k-means clustering is
>>NP-hard.
>
> Many references to Brucker's paper seem to make a
> much stronger claim, and indeed the quote above that
> references it seems to match your original
> definition (distance from a centroid), not the one
> you give here (distance among points in the
> cluster). Since you have a lot better chance of
> picking out the crucial words than I do, try reading
> through these places where Brucker's paper is cited:
>
> http://citeseer.ist.psu.edu/context/152126/0
>
> which if nothing else gives you a lot more papers to
> read!!!
>
> This, in particular, from that URL, seems to imply
> ("some optimisation function") that Brucker's result
> is (much) stronger than you believe:
I don't just believe, I have now the Brucker's paper before my
eyes :). The fact is that (i) Brucker don't consider
centroid-related problems at all, and (ii) the last sentence of
his paper is "Another open question is which Euclidean problems
are NP-complete." The papers citing Brucker that I just looked
through also don't mention centroids...
>>As it turns out, the problem of finding a partition
>>that minimizes some optimization function is NP
>>complete (Brucker, 1978)
>
>
> But I confess to being unable even to distinguish
> NP-complete from NP-hard, so I'm not sure if that's
> helpful or even pertinent.
Loosely speaking, people call NP-hard problems NP-complete. But
formally, an NP-complete problem is required to belong to NP
(i.e. to be a decision (Yes/No) problem having a
polynomial-time verifiable certificate for positive answer),
while an NP-hard problem is any problem not easier than any
NP-complete problem. Of course, any optimization problem is not
a decision problem, and when one calls it NP-hard, it usually
means that asking whether there is a feasible solution with
objective value not worse than some K is NP-complete.
Best,
- Next message: mensanator_at_aol.com: "Re: Harassment of James Harris: A necessary rebuke to those who mocked his veterans status."
- Previous message: Acid Pooh: "Re: Uncountable sets in CZF?"
- In reply to: Kent Paul Dolan: "Re: NP-hardness of k-means clustering"
- Next in thread: Eray Ozkural exa: "Re: NP-hardness of k-means clustering"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|