Related papers: Robust Communication-Optimal Distributed Clusterin…

Distributed $k$-Clustering for Data with Heavy Noise

In this paper, we consider the $k$-center/median/means clustering with outliers problems (or the $(k, z)$-center/median/means problems) in the distributed setting. Most previous distributed algorithms have their communication costs linearly…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-11-30 Xiangyu Guo , Shi Li

On the Communication Complexity of Distributed Clustering

In this paper we give a first set of communication lower bounds for distributed clustering problems, in particular, for k-center, k-median and k-means. When the input is distributed across a large number of machines and the number of…

Computational Complexity · Computer Science 2017-02-03 Qin Zhang

Faster Algorithms for the Constrained k-means Problem

The classical center based clustering problems such as $k$-means/median/center assume that the optimal clusters satisfy the locality property that the points in the same cluster are close to each other. A number of clustering problems arise…

Data Structures and Algorithms · Computer Science 2015-04-13 Anup Bhattacharya , Ragesh Jaiswal , Amit Kumar

Distributed Partial Clustering

Recent years have witnessed an increasing popularity of algorithm design for distributed data, largely due to the fact that massive datasets are often collected and stored in different locations. In the distributed setting communication…

Data Structures and Algorithms · Computer Science 2017-06-06 Sudipto Guha , Yi Li , Qin Zhang

Clustering under Local Stability: Bridging the Gap between Worst-Case and Beyond Worst-Case Analysis

Recently, there has been substantial interest in clustering research that takes a beyond worst-case approach to the analysis of algorithms. The typical idea is to design a clustering algorithm that outputs a near-optimal solution, provided…

Data Structures and Algorithms · Computer Science 2018-12-31 Maria-Florina Balcan , Colin White

Explainable $k$-Means and $k$-Medians Clustering

Clustering is a popular form of unsupervised learning for geometric data. Unfortunately, many clustering algorithms lead to cluster assignments that are hard to explain, partially because they depend on all the features of the data in a…

Machine Learning · Computer Science 2020-09-23 Sanjoy Dasgupta , Nave Frost , Michal Moshkovitz , Cyrus Rashtchian

Clustering under Perturbation Resilience

Motivated by the fact that distances between data points in many real-world clustering instances are often based on heuristic measures, Bilu and Linial~\cite{BL} proposed analyzing objective based clustering problems under the assumption…

Machine Learning · Computer Science 2016-12-13 Maria Florina Balcan , Yingyu Liang

Near-Optimal Clustering in the $k$-machine model

The clustering problem, in its many variants, has numerous applications in operations research and computer science (e.g., in applications in bioinformatics, image processing, social network analysis, etc.). As sizes of data sets have grown…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-10-24 Sayan Bandyapadhyay , Tanmay Inamdar , Shreyas Pai , Sriram V. Pemmaraju

Fast Distributed k-Means with a Small Number of Rounds

We propose a new algorithm for k-means clustering in a distributed setting, where the data is distributed across many machines, and a coordinator communicates with these machines to calculate the output clustering. Our algorithm guarantees…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-14 Tom Hess , Ron Visbord , Sivan Sabato

Optimal Time Bounds for Approximate Clustering

Clustering is a fundamental problem in unsupervised learning, and has been studied widely both as a problem of learning mixture models and as an optimization problem. In this paper, we study clustering with respect the emph{k-median}…

Data Structures and Algorithms · Computer Science 2013-01-07 Ramgopal Mettu , Greg Plaxton

A Practical Algorithm for Distributed Clustering and Outlier Detection

We study the classic $k$-means/median clustering, which are fundamental problems in unsupervised learning, in the setting where data are partitioned across multiple sites, and where we are allowed to discard a small portion of the data by…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-10-12 Jiecao Chen , Erfan Sadeqi Azer , Qin Zhang

Robust Clustering Using Outlier-Sparsity Regularization

Notwithstanding the popularity of conventional clustering algorithms such as K-means and probabilistic clustering, their clustering results are sensitive to the presence of outliers in the data. Even a few outliers can compromise the…

Machine Learning · Statistics 2015-05-27 Pedro A. Forero , Vassilis Kekatos , Georgios B. Giannakis

On Euclidean $k$-Means Clustering with $\alpha$-Center Proximity

$k$-means clustering is NP-hard in the worst case but previous work has shown efficient algorithms assuming the optimal $k$-means clusters are \emph{stable} under additive or multiplicative perturbation of data. This has two caveats. First,…

Data Structures and Algorithms · Computer Science 2019-02-27 Amit Deshpande , Anand Louis , Apoorv Vikram Singh

The Informativeness of K -Means for Learning Mixture Models

The learning of mixture models can be viewed as a clustering problem. Indeed, given data samples independently generated from a mixture of distributions, we often would like to find the {\it correct target clustering} of the samples…

Machine Learning · Statistics 2022-08-26 Zhaoqiang Liu , Vincent Y. F. Tan

FPT Approximations for Capacitated/Fair Clustering with Outliers

Clustering problems such as $k$-Median, and $k$-Means, are motivated from applications such as location planning, unsupervised learning among others. In such applications, it is important to find the clustering of points that is not…

Data Structures and Algorithms · Computer Science 2023-05-03 Rajni Dabas , Neelima Gupta , Tanmay Inamdar

$k$-center Clustering under Perturbation Resilience

The $k$-center problem is a canonical and long-studied facility location and clustering problem with many applications in both its symmetric and asymmetric forms. Both versions of the problem have tight approximation factors on worst case…

Data Structures and Algorithms · Computer Science 2019-01-01 Maria-Florina Balcan , Nika Haghtalab , Colin White

k-Center Clustering with Outliers in Sliding Windows

Metric $k$-center clustering is a fundamental unsupervised learning primitive. Although widely used, this primitive is heavily affected by noise in the data, so that a more sensible variant seeks for the best solution that disregards a…

Machine Learning · Computer Science 2022-02-28 Paolo Pellizzoni , Andrea Pietracaprina , Geppino Pucci

Faster Balanced Clusterings in High Dimension

The problem of constrained clustering has attracted significant attention in the past decades. In this paper, we study the balanced $k$-center, $k$-median, and $k$-means clustering problems where the size of each cluster is constrained by…

Computational Geometry · Computer Science 2018-09-11 Hu Ding

Distributionally Robust K-Means Clustering

K-means clustering is a workhorse of unsupervised learning, but it is notoriously brittle to outliers, distribution shifts, and limited sample sizes. Viewing k-means as Lloyd--Max quantization of the empirical distribution, we develop a…

Machine Learning · Computer Science 2026-04-14 Vikrant Malik , Taylan Kargin , Babak Hassibi

Simple KNN-Based Outlier Detection Achieves Robust Clustering

Being robust to the presence of outliers is crucial for applying clustering algorithms in practice. In the $\textit{robust $k$-Means}$ problem (i.e., $k$-Means with outliers), the goal is to remove $z$ outliers and minimize the $k$-Means…

Machine Learning · Computer Science 2026-05-11 Tianle Jiang , Yufa Zhou