English

Faster Algorithms for the Constrained k-means Problem

Data Structures and Algorithms 2015-04-13 v1

Abstract

The classical center based clustering problems such as kk-means/median/center assume that the optimal clusters satisfy the locality property that the points in the same cluster are close to each other. A number of clustering problems arise in machine learning where the optimal clusters do not follow such a locality property. Consider a variant of the kk-means problem that may be regarded as a general version of such problems. Here, the optimal clusters O1,...,OkO_1, ..., O_k are an arbitrary partition of the dataset and the goal is to output kk-centers c1,...,ckc_1, ..., c_k such that the objective function i=1kxOixci2\sum_{i=1}^{k} \sum_{x \in O_{i}} ||x - c_{i}||^2 is minimized. It is not difficult to argue that any algorithm (without knowing the optimal clusters) that outputs a single set of kk centers, will not behave well as far as optimizing the above objective function is concerned. However, this does not rule out the existence of algorithms that output a list of such kk centers such that at least one of these kk centers behaves well. Given an error parameter ε>0\varepsilon > 0, let \ell denote the size of the smallest list of kk-centers such that at least one of the kk-centers gives a (1+ε)(1+\varepsilon) approximation w.r.t. the objective function above. In this paper, we show an upper bound on \ell by giving a randomized algorithm that outputs a list of 2O~(k/ε)2^{\tilde{O}(k/\varepsilon)} kk-centers. We also give a closely matching lower bound of 2Ω~(k/ε)2^{\tilde{\Omega}(k/\sqrt{\varepsilon})}. Moreover, our algorithm runs in time O(nd2O~(k/ε))O \left(n d \cdot 2^{\tilde{O}(k/\varepsilon)} \right). This is a significant improvement over the previous result of Ding and Xu who gave an algorithm with running time O(nd(logn)k2poly(k/ε))O \left(n d \cdot (\log{n})^{k} \cdot 2^{poly(k/\varepsilon)} \right) and output a list of size O((logn)k2poly(k/ε))O \left((\log{n})^k \cdot 2^{poly(k/\varepsilon)} \right).

Keywords

Cite

@article{arxiv.1504.02564,
  title  = {Faster Algorithms for the Constrained k-means Problem},
  author = {Anup Bhattacharya and Ragesh Jaiswal and Amit Kumar},
  journal= {arXiv preprint arXiv:1504.02564},
  year   = {2015}
}
R2 v1 2026-06-22T09:13:58.547Z