English
Related papers

Related papers: Robust Communication-Optimal Distributed Clusterin…

200 papers

In this paper, we consider the $k$-center/median/means clustering with outliers problems (or the $(k, z)$-center/median/means problems) in the distributed setting. Most previous distributed algorithms have their communication costs linearly…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-11-30 Xiangyu Guo , Shi Li

In this paper we give a first set of communication lower bounds for distributed clustering problems, in particular, for k-center, k-median and k-means. When the input is distributed across a large number of machines and the number of…

Computational Complexity · Computer Science 2017-02-03 Qin Zhang

The classical center based clustering problems such as $k$-means/median/center assume that the optimal clusters satisfy the locality property that the points in the same cluster are close to each other. A number of clustering problems arise…

Data Structures and Algorithms · Computer Science 2015-04-13 Anup Bhattacharya , Ragesh Jaiswal , Amit Kumar

Recent years have witnessed an increasing popularity of algorithm design for distributed data, largely due to the fact that massive datasets are often collected and stored in different locations. In the distributed setting communication…

Data Structures and Algorithms · Computer Science 2017-06-06 Sudipto Guha , Yi Li , Qin Zhang

Recently, there has been substantial interest in clustering research that takes a beyond worst-case approach to the analysis of algorithms. The typical idea is to design a clustering algorithm that outputs a near-optimal solution, provided…

Data Structures and Algorithms · Computer Science 2018-12-31 Maria-Florina Balcan , Colin White

Clustering is a popular form of unsupervised learning for geometric data. Unfortunately, many clustering algorithms lead to cluster assignments that are hard to explain, partially because they depend on all the features of the data in a…

Machine Learning · Computer Science 2020-09-23 Sanjoy Dasgupta , Nave Frost , Michal Moshkovitz , Cyrus Rashtchian

Motivated by the fact that distances between data points in many real-world clustering instances are often based on heuristic measures, Bilu and Linial~\cite{BL} proposed analyzing objective based clustering problems under the assumption…

Machine Learning · Computer Science 2016-12-13 Maria Florina Balcan , Yingyu Liang

The clustering problem, in its many variants, has numerous applications in operations research and computer science (e.g., in applications in bioinformatics, image processing, social network analysis, etc.). As sizes of data sets have grown…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-10-24 Sayan Bandyapadhyay , Tanmay Inamdar , Shreyas Pai , Sriram V. Pemmaraju

We propose a new algorithm for k-means clustering in a distributed setting, where the data is distributed across many machines, and a coordinator communicates with these machines to calculate the output clustering. Our algorithm guarantees…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-14 Tom Hess , Ron Visbord , Sivan Sabato

Clustering is a fundamental problem in unsupervised learning, and has been studied widely both as a problem of learning mixture models and as an optimization problem. In this paper, we study clustering with respect the emph{k-median}…

Data Structures and Algorithms · Computer Science 2013-01-07 Ramgopal Mettu , Greg Plaxton

We study the classic $k$-means/median clustering, which are fundamental problems in unsupervised learning, in the setting where data are partitioned across multiple sites, and where we are allowed to discard a small portion of the data by…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-10-12 Jiecao Chen , Erfan Sadeqi Azer , Qin Zhang

Notwithstanding the popularity of conventional clustering algorithms such as K-means and probabilistic clustering, their clustering results are sensitive to the presence of outliers in the data. Even a few outliers can compromise the…

Machine Learning · Statistics 2015-05-27 Pedro A. Forero , Vassilis Kekatos , Georgios B. Giannakis

$k$-means clustering is NP-hard in the worst case but previous work has shown efficient algorithms assuming the optimal $k$-means clusters are \emph{stable} under additive or multiplicative perturbation of data. This has two caveats. First,…

Data Structures and Algorithms · Computer Science 2019-02-27 Amit Deshpande , Anand Louis , Apoorv Vikram Singh

The learning of mixture models can be viewed as a clustering problem. Indeed, given data samples independently generated from a mixture of distributions, we often would like to find the {\it correct target clustering} of the samples…

Machine Learning · Statistics 2022-08-26 Zhaoqiang Liu , Vincent Y. F. Tan

Clustering problems such as $k$-Median, and $k$-Means, are motivated from applications such as location planning, unsupervised learning among others. In such applications, it is important to find the clustering of points that is not…

Data Structures and Algorithms · Computer Science 2023-05-03 Rajni Dabas , Neelima Gupta , Tanmay Inamdar

The $k$-center problem is a canonical and long-studied facility location and clustering problem with many applications in both its symmetric and asymmetric forms. Both versions of the problem have tight approximation factors on worst case…

Data Structures and Algorithms · Computer Science 2019-01-01 Maria-Florina Balcan , Nika Haghtalab , Colin White

Metric $k$-center clustering is a fundamental unsupervised learning primitive. Although widely used, this primitive is heavily affected by noise in the data, so that a more sensible variant seeks for the best solution that disregards a…

Machine Learning · Computer Science 2022-02-28 Paolo Pellizzoni , Andrea Pietracaprina , Geppino Pucci

The problem of constrained clustering has attracted significant attention in the past decades. In this paper, we study the balanced $k$-center, $k$-median, and $k$-means clustering problems where the size of each cluster is constrained by…

Computational Geometry · Computer Science 2018-09-11 Hu Ding

K-means clustering is a workhorse of unsupervised learning, but it is notoriously brittle to outliers, distribution shifts, and limited sample sizes. Viewing k-means as Lloyd--Max quantization of the empirical distribution, we develop a…

Machine Learning · Computer Science 2026-04-14 Vikrant Malik , Taylan Kargin , Babak Hassibi

Being robust to the presence of outliers is crucial for applying clustering algorithms in practice. In the $\textit{robust $k$-Means}$ problem (i.e., $k$-Means with outliers), the goal is to remove $z$ outliers and minimize the $k$-Means…

Machine Learning · Computer Science 2026-05-11 Tianle Jiang , Yufa Zhou
‹ Prev 1 2 3 10 Next ›