English
Related papers

Related papers: Clustering with Distributed Data

200 papers

K-means is an effective clustering technique used to separate similar data into groups based on initial centroids of clusters. In this paper, Normalization based K-means clustering algorithm(N-K means) is proposed. Proposed N-K means…

Machine Learning · Computer Science 2015-03-04 Deepali Virmani , Shweta Taneja , Geetika Malhotra

In this paper we provide a fully distributed implementation of the k-means clustering algorithm, intended for wireless sensor networks where each agent is endowed with a possibly high-dimensional observation (e.g., position, humidity,…

Machine Learning · Computer Science 2014-11-11 Gabriele Oliva , Roberto Setola , Christoforos N. Hadjicostis

In this paper, we present a distributed algorithm which implements the $k$-means algorithm in a distributed fashion for multi-agent systems with directed communication links. The goal of $k$-means is to partition the network's agents in…

Systems and Control · Electrical Eng. & Systems 2022-07-19 Apostolos I. Rikos , Gabriele Oliva , Christoforos N. Hadjicostis , Karl H. Johansson

Clustering is a fundamental problem in unsupervised machine learning with many applications in data analysis. Popular clustering algorithms such as Lloyd's algorithm and $k$-means++ can take $\Omega(ndk)$ time when clustering $n$ points in…

Machine Learning · Computer Science 2023-10-26 Moses Charikar , Monika Henzinger , Lunjia Hu , Maxmilian Vötsch , Erik Waingarten

We propose a new algorithm for k-means clustering in a distributed setting, where the data is distributed across many machines, and a coordinator communicates with these machines to calculate the output clustering. Our algorithm guarantees…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-14 Tom Hess , Ron Visbord , Sivan Sabato

$K$-means, a simple and effective clustering algorithm, is one of the most widely used algorithms in multimedia and computer vision community. Traditional $k$-means is an iterative algorithm---in each iteration new cluster centers are…

Computer Vision and Pattern Recognition · Computer Science 2013-12-12 Jingdong Wang , Jing Wang , Qifa Ke , Gang Zeng , Shipeng Li

This paper provides new algorithms for distributed clustering for two popular center-based objectives, k-median and k-means. These algorithms have provable guarantees and improve communication complexity over existing approaches. Following…

Machine Learning · Computer Science 2020-01-28 Maria Florina Balcan , Steven Ehrlich , Yingyu Liang

Though mostly used as a clustering algorithm, k-means are originally designed as a quantization algorithm. Namely, it aims at providing a compression of a probability distribution with k points. Building upon [21, 33], we try to investigate…

Statistics Theory · Mathematics 2018-01-31 Clément Levrard

K-means plays a vital role in data mining and is the simplest and most widely used algorithm under the Euclidean Minimum Sum-of-Squares Clustering (MSSC) model. However, its performance drastically drops when applied to vast amounts of…

Machine Learning · Computer Science 2023-11-27 Rustam Mussabayev , Nenad Mladenovic , Bassem Jarboui , Ravil Mussabayev

Clustering is a widely used and powerful machine learning technique, but its effectiveness is often limited by the need to specify the number of clusters, k, or by relying on thresholds that implicitly determine k. We introduce k*-means, a…

Machine Learning · Computer Science 2025-05-20 Louis Mahon , Mirella Lapata

Clustering is an effective technique in data mining to generate groups that are the matter of interest. Among various clustering approaches, the family of k-means algorithms and min-cut algorithms gain most popularity due to their…

Machine Learning · Computer Science 2014-11-25 Xiaojun Chang , Feiping Nie , Zhigang Ma , Yi Yang

Algorithms for clustering points in metric spaces is a long-studied area of research. Clustering has seen a multitude of work both theoretically, in understanding the approximation guarantees possible for many objective functions such as…

Data Structures and Algorithms · Computer Science 2019-05-27 Maria-Florina Balcan , Travis Dick , Colin White

The K-means algorithm is arguably the most popular data clustering method, commonly applied to processed datasets in some "feature spaces", as is in spectral clustering. Highly sensitive to initializations, however, K-means encounters a…

Machine Learning · Computer Science 2019-06-04 Feiyu Chen , Yuchen Yang , Liwei Xu , Taiping Zhang , Yin Zhang

The $k$-means algorithm (Lloyd's algorithm) is a widely used method for clustering unlabeled data. A key bottleneck of the $k$-means algorithm is that each iteration requires time linear in the number of data points, which can be expensive…

In this work, we study the $k$-median and $k$-means clustering problems when the data is distributed across many servers and can contain outliers. While there has been a lot of work on these problems for worst-case instances, we focus on…

Data Structures and Algorithms · Computer Science 2019-03-08 Pranjal Awasthi , Ainesh Bakshi , Maria-Florina Balcan , Colin White , David Woodruff

There has been much progress on efficient algorithms for clustering data points generated by a mixture of $k$ probability distributions under the assumption that the means of the distributions are well-separated, i.e., the distance between…

Data Structures and Algorithms · Computer Science 2010-04-13 Amit Kumar , Ravindran Kannan

The $k$-means algorithm is arguably the most popular nonparametric clustering method but cannot generally be applied to datasets with incomplete records. The usual practice then is to either impute missing values under an assumed…

Machine Learning · Statistics 2018-09-11 Andrew Lithio , Ranjan Maitra

Many clustering algorithms exist that estimate a cluster centroid, such as K-means, K-medoids or mean-shift, but no algorithm seems to exist that clusters data by returning exactly K meaningful modes. We propose a natural definition of a…

Machine Learning · Computer Science 2013-04-25 Miguel Á. Carreira-Perpiñán , Weiran Wang

The k-means clustering algorithm is a popular algorithm that partitions data into k clusters. There are many improvements to accelerate the standard algorithm. Most current research employs upper and lower bounds on point-to-cluster…

Machine Learning · Computer Science 2024-10-22 Andreas Lang , Erich Schubert

Motivated by recent work in computational social choice, we extend the metric distortion framework to clustering problems. Given a set of $n$ agents located in an underlying metric space, our goal is to partition them into $k$ clusters,…

Computer Science and Game Theory · Computer Science 2024-02-07 Jakob Burkhardt , Ioannis Caragiannis , Karl Fehrs , Matteo Russo , Chris Schwiegelshohn , Sudarshan Shyam
‹ Prev 1 2 3 10 Next ›