Related papers: Mini-Batch Kernel $k$-means

Mini-batch $k$-means terminates within $O(d/\epsilon)$ iterations

We answer the question: "Does local progress (on batches) imply global progress (on the entire dataset) for mini-batch $k$-means?". Specifically, we consider mini-batch $k$-means which terminates only when the improvement in the quality of…

Machine Learning · Computer Science 2023-04-04 Gregory Schwartzman

A Faster $k$-means++ Algorithm

$k$-means++ is an important algorithm for choosing initial cluster centers for the $k$-means clustering algorithm. In this work, we present a new algorithm that can solve the $k$-means++ problem with nearly optimal running time. Given $n$…

Data Structures and Algorithms · Computer Science 2024-02-15 Jiehao Liang , Somdeb Sarkhel , Zhao Song , Chenbo Yin , Junze Yin , Danyang Zhuo

Faster Algorithms for the Constrained k-means Problem

The classical center based clustering problems such as $k$-means/median/center assume that the optimal clusters satisfy the locality property that the points in the same cluster are close to each other. A number of clustering problems arise…

Data Structures and Algorithms · Computer Science 2015-04-13 Anup Bhattacharya , Ragesh Jaiswal , Amit Kumar

Coresets for Kernel Clustering

We devise coresets for kernel $k$-Means with a general kernel, and use them to obtain new, more efficient, algorithms. Kernel $k$-Means has superior clustering capability compared to classical $k$-Means, particularly when clusters are…

Data Structures and Algorithms · Computer Science 2024-04-09 Shaofeng H. -C. Jiang , Robert Krauthgamer , Jianing Lou , Yubo Zhang

Nested Mini-Batch K-Means

A new algorithm is proposed which accelerates the mini-batch k-means algorithm of Sculley (2010) by using the distance bounding approach of Elkan (2003). We argue that, when incorporating distance bounds into a mini-batch algorithm, already…

Machine Learning · Statistics 2016-09-14 James Newling , François Fleuret

A New Rejection Sampling Approach to $k$-$\mathtt{means}$++ With Improved Trade-Offs

The $k$-$\mathtt{means}$++ seeding algorithm (Arthur & Vassilvitskii, 2007) is widely used in practice for the $k$-means clustering problem where the goal is to cluster a dataset $\mathcal{X} \subset \mathbb{R} ^d$ into $k$ clusters. The…

Data Structures and Algorithms · Computer Science 2025-02-05 Poojan Shah , Shashwat Agrawal , Ragesh Jaiswal

Convergence rate of stochastic k-means

We analyze online \cite{BottouBengio} and mini-batch \cite{Sculley} $k$-means variants. Both scale up the widely used $k$-means algorithm via stochastic approximation, and have become popular for large-scale clustering and unsupervised…

Machine Learning · Computer Science 2016-11-17 Cheng Tang , Claire Monteleoni

OneBatchPAM: A Fast and Frugal K-Medoids Algorithm

This paper proposes a novel k-medoids approximation algorithm to handle large-scale datasets with reasonable computational time and memory complexity. We develop a local-search algorithm that iteratively improves the medoid selection based…

Machine Learning · Computer Science 2025-02-03 Antoine de Mathelin , Nicolas Enrique Cecchi , François Deheeger , Mathilde Mougeot , Nicolas Vayatis

k2-means for fast and accurate large scale clustering

We propose k^2-means, a new clustering method which efficiently copes with large numbers of clusters and achieves low energy solutions. k^2-means builds upon the standard k-means (Lloyd's algorithm) and combines a new strategy to accelerate…

Machine Learning · Computer Science 2016-05-31 Eirikur Agustsson , Radu Timofte , Luc Van Gool

Scalable Kernel Clustering: Approximate Kernel k-means

Kernel-based clustering algorithms have the ability to capture the non-linear structure in real world data. Among various kernel-based clustering algorithms, kernel k-means has gained popularity due to its simple iterative nature and ease…

Computer Vision and Pattern Recognition · Computer Science 2014-02-18 Radha Chitta , Rong Jin , Timothy C. Havens , Anil K. Jain

Finer-Grained Hardness of Kernel Density Estimation

In batch Kernel Density Estimation (KDE) for a kernel function $f$, we are given as input $2n$ points $x^{(1)}, \cdots, x^{(n)}, y^{(1)}, \cdots, y^{(n)}$ in dimension $m$, as well as a vector $v \in \mathbb{R}^n$. These inputs implicitly…

Data Structures and Algorithms · Computer Science 2024-07-03 Josh Alman , Yunfeng Guan

A Quantum Approximation Scheme for k-Means

We give a quantum approximation scheme (i.e., $(1 + \varepsilon)$-approximation for every $\varepsilon > 0$) for the classical $k$-means clustering problem in the QRAM model with a running time that has only polylogarithmic dependence on…

Quantum Physics · Physics 2025-05-27 Ragesh Jaiswal

Fully Dynamic k-Means Coreset in Near-Optimal Update Time

We study in this paper the problem of maintaining a solution to $k$-median and $k$-means clustering in a fully dynamic setting. To do so, we present an algorithm to efficiently maintain a coreset, a compressed version of the dataset, that…

Data Structures and Algorithms · Computer Science 2024-07-01 Max Dupré la Tour , Monika Henzinger , David Saulpic

Seeding K-Means using Method of Moments

K-means is one of the most widely used algorithms for clustering in Data Mining applications, which attempts to minimize the sum of the square of the Euclidean distance of the points in the clusters from the respective means of the…

Machine Learning · Computer Science 2016-11-01 Sayantan Dasgupta

Fast Randomized Kernel Methods With Statistical Guarantees

One approach to improving the running time of kernel-based machine learning methods is to build a small sketch of the input and use it in lieu of the full kernel matrix in the machine learning task of interest. Here, we describe a version…

Machine Learning · Statistics 2015-11-10 Ahmed El Alaoui , Michael W. Mahoney

Linear time small coresets for k-mean clustering of segments with applications

We study the $k$-means problem for a set $\mathcal{S} \subseteq \mathbb{R}^d$ of $n$ segments, aiming to find $k$ centers $X \subseteq \mathbb{R}^d$ that minimize $D(\mathcal{S},X) := \sum_{S \in \mathcal{S}} \min_{x \in X} D(S,x)$, where…

Machine Learning · Computer Science 2025-11-21 David Denisov , Shlomi Dolev , Dan Felmdan , Michael Segal

k-Means for Streaming and Distributed Big Sparse Data

We provide the first streaming algorithm for computing a provable approximation to the $k$-means of sparse Big data. Here, sparse Big Data is a set of $n$ vectors in $\mathbb{R}^d$, where each vector has $O(1)$ non-zeroes entries, and…

Data Structures and Algorithms · Computer Science 2016-02-09 Artem Barger , Dan Feldman

q-means: A quantum algorithm for unsupervised machine learning

Quantum machine learning is one of the most promising applications of a full-scale quantum computer. Over the past few years, many quantum machine learning algorithms have been proposed that can potentially offer considerable speedups over…

Quantum Physics · Physics 2021-06-14 Iordanis Kerenidis , Jonas Landman , Alessandro Luongo , Anupam Prakash

Simple, Scalable and Effective Clustering via One-Dimensional Projections

Clustering is a fundamental problem in unsupervised machine learning with many applications in data analysis. Popular clustering algorithms such as Lloyd's algorithm and $k$-means++ can take $\Omega(ndk)$ time when clustering $n$ points in…

Machine Learning · Computer Science 2023-10-26 Moses Charikar , Monika Henzinger , Lunjia Hu , Maxmilian Vötsch , Erik Waingarten

Is Input Sparsity Time Possible for Kernel Low-Rank Approximation?

Low-rank approximation is a common tool used to accelerate kernel methods: the $n \times n$ kernel matrix $K$ is approximated via a rank-$k$ matrix $\tilde K$ which can be stored in much less space and processed more quickly. In this work…

Data Structures and Algorithms · Computer Science 2017-11-07 Cameron Musco , David P. Woodruff