English

Improved Smoothed Analysis of the k-Means Method

Data Structures and Algorithms 2008-09-11 v1

Abstract

The k-means method is a widely used clustering algorithm. One of its distinguished features is its speed in practice. Its worst-case running-time, however, is exponential, leaving a gap between practical and theoretical performance. Arthur and Vassilvitskii (FOCS 2006) aimed at closing this gap, and they proved a bound of \poly(nk,σ1)\poly(n^k, \sigma^{-1}) on the smoothed running-time of the k-means method, where n is the number of data points and σ\sigma is the standard deviation of the Gaussian perturbation. This bound, though better than the worst-case bound, is still much larger than the running-time observed in practice. We improve the smoothed analysis of the k-means method by showing two upper bounds on the expected running-time of k-means. First, we prove that the expected running-time is bounded by a polynomial in nkn^{\sqrt k} and σ1\sigma^{-1}. Second, we prove an upper bound of kkd\poly(n,σ1)k^{kd} \cdot \poly(n, \sigma^{-1}), where d is the dimension of the data space. The polynomial is independent of k and d, and we obtain a polynomial bound for the expected running-time for k,dO(logn/loglogn)k, d \in O(\sqrt{\log n/\log \log n}). Finally, we show that k-means runs in smoothed polynomial time for one-dimensional instances.

Keywords

Cite

@article{arxiv.0809.1715,
  title  = {Improved Smoothed Analysis of the k-Means Method},
  author = {Bodo Manthey and Heiko Röglin},
  journal= {arXiv preprint arXiv:0809.1715},
  year   = {2008}
}

Comments

To be presented at the 20th ACM-SIAM Symposium on Discrete Algorithms (SODA 2009)

R2 v1 2026-06-21T11:18:40.504Z