Faster Algorithms for the Constrained k-means Problem
Abstract
The classical center based clustering problems such as -means/median/center assume that the optimal clusters satisfy the locality property that the points in the same cluster are close to each other. A number of clustering problems arise in machine learning where the optimal clusters do not follow such a locality property. Consider a variant of the -means problem that may be regarded as a general version of such problems. Here, the optimal clusters are an arbitrary partition of the dataset and the goal is to output -centers such that the objective function is minimized. It is not difficult to argue that any algorithm (without knowing the optimal clusters) that outputs a single set of centers, will not behave well as far as optimizing the above objective function is concerned. However, this does not rule out the existence of algorithms that output a list of such centers such that at least one of these centers behaves well. Given an error parameter , let denote the size of the smallest list of -centers such that at least one of the -centers gives a approximation w.r.t. the objective function above. In this paper, we show an upper bound on by giving a randomized algorithm that outputs a list of -centers. We also give a closely matching lower bound of . Moreover, our algorithm runs in time . This is a significant improvement over the previous result of Ding and Xu who gave an algorithm with running time and output a list of size .
Cite
@article{arxiv.1504.02564,
title = {Faster Algorithms for the Constrained k-means Problem},
author = {Anup Bhattacharya and Ragesh Jaiswal and Amit Kumar},
journal= {arXiv preprint arXiv:1504.02564},
year = {2015}
}