Related papers: Replicable Clustering

Simple, Scalable and Effective Clustering via One-Dimensional Projections

Clustering is a fundamental problem in unsupervised machine learning with many applications in data analysis. Popular clustering algorithms such as Lloyd's algorithm and $k$-means++ can take $\Omega(ndk)$ time when clustering $n$ points in…

Machine Learning · Computer Science 2023-10-26 Moses Charikar , Monika Henzinger , Lunjia Hu , Maxmilian Vötsch , Erik Waingarten

Fully Scalable MPC Algorithms for Clustering in High Dimension

We design new parallel algorithms for clustering in high-dimensional Euclidean spaces. These algorithms run in the Massively Parallel Computation (MPC) model, and are fully scalable, meaning that the local memory in each machine may be…

Data Structures and Algorithms · Computer Science 2024-07-09 Artur Czumaj , Guichen Gao , Shaofeng H. -C. Jiang , Robert Krauthgamer , Pavel Veselý

Scalable Kernel Clustering: Approximate Kernel k-means

Kernel-based clustering algorithms have the ability to capture the non-linear structure in real world data. Among various kernel-based clustering algorithms, kernel k-means has gained popularity due to its simple iterative nature and ease…

Computer Vision and Pattern Recognition · Computer Science 2014-02-18 Radha Chitta , Rong Jin , Timothy C. Havens , Anil K. Jain

On Euclidean $k$-Means Clustering with $\alpha$-Center Proximity

$k$-means clustering is NP-hard in the worst case but previous work has shown efficient algorithms assuming the optimal $k$-means clusters are \emph{stable} under additive or multiplicative perturbation of data. This has two caveats. First,…

Data Structures and Algorithms · Computer Science 2019-02-27 Amit Deshpande , Anand Louis , Apoorv Vikram Singh

Explainable $k$-Means and $k$-Medians Clustering

Clustering is a popular form of unsupervised learning for geometric data. Unfortunately, many clustering algorithms lead to cluster assignments that are hard to explain, partially because they depend on all the features of the data in a…

Machine Learning · Computer Science 2020-09-23 Sanjoy Dasgupta , Nave Frost , Michal Moshkovitz , Cyrus Rashtchian

Differentially Private Clustering in Data Streams

Clustering problems (such as $k$-means and $k$-median) are fundamental unsupervised machine learning primitives, and streaming clustering algorithms have been extensively studied in the past. However, since data privacy becomes a central…

Data Structures and Algorithms · Computer Science 2025-10-03 Alessandro Epasto , Tamalika Mukherjee , Peilin Zhong

Near-Optimal Quantum Coreset Construction Algorithms for Clustering

$k$-Clustering in $\mathbb{R}^d$ (e.g., $k$-median and $k$-means) is a fundamental machine learning problem. While near-linear time approximation algorithms were known in the classical setting for a dataset with cardinality $n$, it remains…

Quantum Physics · Physics 2023-06-06 Yecheng Xue , Xiaoyu Chen , Tongyang Li , Shaofeng H. -C. Jiang

Scalable Clustering: Large Scale Unsupervised Learning of Gaussian Mixture Models with Outliers

Clustering is a widely used technique with a long and rich history in a variety of areas. However, most existing algorithms do not scale well to large datasets, or are missing theoretical guarantees of convergence. This paper introduces a…

Machine Learning · Statistics 2024-10-16 Yijia Zhou , Kyle A. Gallivan , Adrian Barbu

Almost Tight Approximation Algorithms for Explainable Clustering

Recently, due to an increasing interest for transparency in artificial intelligence, several methods of explainable machine learning have been developed with the simultaneous goal of accuracy and interpretability by humans. In this paper,…

Machine Learning · Computer Science 2021-07-16 Hossein Esfandiari , Vahab Mirrokni , Shyam Narayanan

Semi-Supervised Algorithms for Approximately Optimal and Accurate Clustering

We study $k$-means clustering in a semi-supervised setting. Given an oracle that returns whether two given points belong to the same cluster in a fixed optimal clustering, we investigate the following question: how many oracle queries are…

Data Structures and Algorithms · Computer Science 2018-11-07 Buddhima Gamlath , Sangxia Huang , Ola Svensson

Replicability in High Dimensional Statistics

The replicability crisis is a major issue across nearly all areas of empirical science, calling for the formal study of replicability in statistics. Motivated in this context, [Impagliazzo, Lei, Pitassi, and Sorrell STOC 2022] introduced…

Machine Learning · Statistics 2024-06-06 Max Hopkins , Russell Impagliazzo , Daniel Kane , Sihan Liu , Christopher Ye

Replicability in Reinforcement Learning

We initiate the mathematical study of replicability as an algorithmic property in the context of reinforcement learning (RL). We focus on the fundamental setting of discounted tabular MDPs with access to a generative model. Inspired by…

Machine Learning · Computer Science 2023-10-31 Amin Karbasi , Grigoris Velegkas , Lin F. Yang , Felix Zhou

An efficient K -means clustering algorithm for massive data

The analysis of continously larger datasets is a task of major importance in a wide variety of scientific fields. In this sense, cluster analysis algorithms are a key element of exploratory data analysis, due to their easiness in the…

Machine Learning · Statistics 2018-01-10 Marco Capó , Aritz Pérez , Jose A. Lozano

Cross-Study Replicability in Cluster Analysis

In cancer research, clustering techniques are widely used for exploratory analyses and dimensionality reduction, playing a critical role in the identification of novel cancer subtypes, often with direct implications for patient management.…

Methodology · Statistics 2023-05-11 Lorenzo Masoero , Emma Thomas , Giovanni Parmigiani , Svitlana Tyekucheva , Lorenzo Trippa

Reproducibility in Learning

We introduce the notion of a reproducible algorithm in the context of learning. A reproducible learning algorithm is resilient to variations in its samples -- with high probability, it returns the exact same output when run on two samples…

Machine Learning · Computer Science 2023-04-17 Russell Impagliazzo , Rex Lei , Toniann Pitassi , Jessica Sorrell

Fully Dynamic Euclidean k-Means

We consider the Euclidean $k$-means clustering problem in a dynamic setting, where we have to explicitly maintain a solution (a set of $k$ centers) $S \subseteq \mathbb{R}^d$ subject to point insertions/deletions in $\mathbb{R}^d$. We…

Data Structures and Algorithms · Computer Science 2026-04-03 Sayan Bhattacharya , Martín Costa , Ermiya Farokhnejad , Shaofeng H. -C. Jiang , Yaonan Jin , Jianing Lou

Almost-Optimal Upper and Lower Bounds for Clustering in Low Dimensional Euclidean Spaces

The $k$-median and $k$-means clustering objectives are classic objectives for modeling clustering in a metric space. Given a set of points in a metric space, the goal of the $k$-median (resp. $k$-means) problem is to find $k$ representative…

Computational Geometry · Computer Science 2026-03-11 Vincent Cohen-Addad , Karthik C. S. , David Saulpic , Chris Schwiegelshohn

Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean Estimation

We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset. Specifically, we are given a set $T$ of $n$ points in $\mathbb{R}^d$ and a parameter $0< \alpha <\frac 1 2$ such that an…

Data Structures and Algorithms · Computer Science 2021-11-15 Ilias Diakonikolas , Daniel M. Kane , Daniel Kongsgaard , Jerry Li , Kevin Tian

Approximate Clustering with Same-Cluster Queries

Ashtiani et al. proposed a Semi-Supervised Active Clustering framework (SSAC), where the learner is allowed to make adaptive queries to a domain expert. The queries are of the kind "do two given points belong to the same optimal cluster?"…

Data Structures and Algorithms · Computer Science 2017-10-05 Nir Ailon , Anup Bhattacharya , Ragesh Jaiswal , Amit Kumar

Faster Algorithms for the Constrained k-means Problem

The classical center based clustering problems such as $k$-means/median/center assume that the optimal clusters satisfy the locality property that the points in the same cluster are close to each other. A number of clustering problems arise…

Data Structures and Algorithms · Computer Science 2015-04-13 Anup Bhattacharya , Ragesh Jaiswal , Amit Kumar