Related papers: Improved Smoothed Analysis of the k-Means Method

k-Means has Polynomial Smoothed Complexity

The k-means method is one of the most widely used clustering algorithms, drawing its popularity from its speed in practice. Recently, however, it was shown to have exponential worst-case running time. In order to close the gap between…

Data Structures and Algorithms · Computer Science 2009-08-07 David Arthur , Bodo Manthey , Heiko Röglin

k-means requires exponentially many iterations even in the plane

The k-means algorithm is a well-known method for partitioning n points that lie in the d-dimensional space into k clusters. Its main features are simplicity and speed in practice. Theoretically, however, the best known upper bound on its…

Computational Geometry · Computer Science 2008-12-03 Andrea Vattani

Worst-Case and Smoothed Analysis of the Hartigan-Wong Method for k-Means Clustering

We analyze the running time of the Hartigan-Wong method, an old algorithm for the $k$-means clustering problem. First, we construct an instance on the line on which the method can take $2^{\Omega(n)}$ steps to converge, demonstrating that…

Data Structures and Algorithms · Computer Science 2024-01-18 Bodo Manthey , Jesse van Rhijn

Theoretical Analysis of the $k$-Means Algorithm - A Survey

The $k$-means algorithm is one of the most widely used clustering heuristics. Despite its simplicity, analyzing its running time and quality of approximation is surprisingly difficult and can lead to deep insights that can be used to…

Data Structures and Algorithms · Computer Science 2016-02-29 Johannes Blömer , Christiane Lammersen , Melanie Schmidt , Christian Sohler

On Variants of k-means Clustering

\textit{Clustering problems} often arise in the fields like data mining, machine learning etc. to group a collection of objects into similar groups with respect to a similarity (or dissimilarity) measure. Among the clustering problems,…

Computational Geometry · Computer Science 2015-12-10 Sayan Bandyapadhyay , Kasturi Varadarajan

Provably faster randomized and quantum algorithms for $k$-means clustering via uniform sampling

The $k$-means algorithm (Lloyd's algorithm) is a widely used method for clustering unlabeled data. A key bottleneck of the $k$-means algorithm is that each iteration requires time linear in the number of data points, which can be expensive…

Quantum Physics · Physics 2025-10-14 Tyler Chen , Archan Ray , Akshay Seshadri , Dylan Herman , Bao Bach , Pranav Deshpande , Abhishek Som , Niraj Kumar , Marco Pistoia

Optimal Smoothed Analysis of the Simplex Method

Smoothed analysis is a method for analyzing the performance of algorithms, used especially for those algorithms whose running time in practice is significantly better than what can be proven through worst-case analysis. Spielman and Teng…

Data Structures and Algorithms · Computer Science 2026-05-26 Eleon Bach , Sophie Huiberts

$t$-$k$-means: A Robust and Stable $k$-means Variant

$k$-means algorithm is one of the most classical clustering methods, which has been widely and successfully used in signal processing. However, due to the thin-tailed property of the Gaussian distribution, $k$-means algorithm suffers from…

Machine Learning · Computer Science 2021-02-02 Yiming Li , Yang Zhang , Qingtao Tang , Weipeng Huang , Yong Jiang , Shu-Tao Xia

Fast k-means algorithm clustering

k-means has recently been recognized as one of the best algorithms for clustering unsupervised data. Since k-means depends mainly on distance calculation between all data points and the centers, the time cost will be high when the size of…

Data Structures and Algorithms · Computer Science 2011-08-08 Raied Salman , Vojislav Kecman , Qi Li , Robert Strack , Erik Test

Learning Mixtures of Gaussians using the k-means Algorithm

One of the most popular algorithms for clustering in Euclidean space is the $k$-means algorithm; $k$-means is difficult to analyze mathematically, and few theoretical guarantees are known about it, particularly when the data is {\em…

Machine Learning · Computer Science 2009-12-02 Kamalika Chaudhuri , Sanjoy Dasgupta , Andrea Vattani

A balanced k-means algorithm for weighted point sets

The classical $k$-means algorithm for partitioning $n$ points in $\mathbb{R}^d$ into $k$ clusters is one of the most popular and widely spread clustering methods. The need to respect prescribed lower bounds on the cluster sizes has been…

Optimization and Control · Mathematics 2016-08-04 Steffen Borgwardt , Andreas Brieden , Peter Gritzmann

Coresets for constrained k-median and k-means clustering in low dimensional Euclidean space

We study (Euclidean) $k$-median and $k$-means with constraints in the streaming model. There have been recent efforts to design unified algorithms to solve constrained $k$-means problems without using knowledge of the specific constraint at…

Data Structures and Algorithms · Computer Science 2021-06-15 Melanie Schmidt , Julian Wargalla

Faster Balanced Clusterings in High Dimension

The problem of constrained clustering has attracted significant attention in the past decades. In this paper, we study the balanced $k$-center, $k$-median, and $k$-means clustering problems where the size of each cluster is constrained by…

Computational Geometry · Computer Science 2018-09-11 Hu Ding

Faster Algorithms for the Constrained k-means Problem

The classical center based clustering problems such as $k$-means/median/center assume that the optimal clusters satisfy the locality property that the points in the same cluster are close to each other. A number of clustering problems arise…

Data Structures and Algorithms · Computer Science 2015-04-13 Anup Bhattacharya , Ragesh Jaiswal , Amit Kumar

An Improved Greedy Approximation for (Metric) $k$-Means

Clustering is a basic task in data analysis and machine learning, and the optimization of clustering objectives are well-studied optimization problems; amongst these, the $k$-Means objective is arguably the most well known. Given a…

Data Structures and Algorithms · Computer Science 2026-05-29 Moses Charikar , Vincent Cohen-Addad , Ruiquan Gao , Fabrizio Grandoni , Euiwoong Lee , Ernest van Wijland

A Nearly Tight Analysis of Greedy k-means++

The famous $k$-means++ algorithm of Arthur and Vassilvitskii [SODA 2007] is the most popular way of solving the $k$-means problem in practice. The algorithm is very simple: it samples the first center uniformly at random and each of the…

Data Structures and Algorithms · Computer Science 2022-07-19 Christoph Grunau , Ahmet Alper Özüdoğru , Václav Rozhoň , Jakub Tětek

Global $k$-means$++$: an effective relaxation of the global $k$-means clustering algorithm

The $k$-means algorithm is a prevalent clustering method due to its simplicity, effectiveness, and speed. However, its main disadvantage is its high sensitivity to the initial positions of the cluster centers. The global $k$-means is a…

Machine Learning · Computer Science 2023-07-17 Georgios Vardakas , Aristidis Likas

K+ Means : An Enhancement Over K-Means Clustering Algorithm

K-means (MacQueen, 1967) [1] is one of the simplest unsupervised learning algorithms that solve the well-known clustering problem. The procedure follows a simple and easy way to classify a given data set to a predefined, say K number of…

Machine Learning · Computer Science 2017-06-23 Srikanta Kolay , Kumar Sankar Ray , Abhoy Chand Mondal

Fast k-means based on KNN Graph

In the era of big data, k-means clustering has been widely adopted as a basic processing tool in various contexts. However, its computational cost could be prohibitively high as the data size and the cluster number are large. It is well…

Machine Learning · Computer Science 2017-05-05 Cheng-Hao Deng , Wan-Lei Zhao

Breathing K-Means: Superior K-Means Solutions through Dynamic K-Values

We introduce the breathing k-means algorithm, which on average significantly improves solutions obtained by the widely-known greedy k-means++ algorithm, the default method for k-means clustering in the scikit-learn package. The improvements…

Machine Learning · Computer Science 2024-08-20 Bernd Fritzke