English
Related papers

Related papers: Distributed Partial Clustering

200 papers

This paper provides new algorithms for distributed clustering for two popular center-based objectives, k-median and k-means. These algorithms have provable guarantees and improve communication complexity over existing approaches. Following…

Machine Learning · Computer Science 2020-01-28 Maria Florina Balcan , Steven Ehrlich , Yingyu Liang

In this work, we study the $k$-median and $k$-means clustering problems when the data is distributed across many servers and can contain outliers. While there has been a lot of work on these problems for worst-case instances, we focus on…

Data Structures and Algorithms · Computer Science 2019-03-08 Pranjal Awasthi , Ainesh Bakshi , Maria-Florina Balcan , Colin White , David Woodruff

In this paper, we consider the $k$-center/median/means clustering with outliers problems (or the $(k, z)$-center/median/means problems) in the distributed setting. Most previous distributed algorithms have their communication costs linearly…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-11-30 Xiangyu Guo , Shi Li

We study the classic $k$-means/median clustering, which are fundamental problems in unsupervised learning, in the setting where data are partitioned across multiple sites, and where we are allowed to discard a small portion of the data by…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-10-12 Jiecao Chen , Erfan Sadeqi Azer , Qin Zhang

One key use of k-means clustering is to identify cluster prototypes which can serve as representative points for a dataset. However, a drawback of using k-means cluster centers as representative points is that such points distort the…

Machine Learning · Statistics 2019-11-15 Arvind Krishna , Simon Mak , Roshan Joseph

Distributed data mining techniques and mainly distributed clustering are widely used in the last decade because they deal with very large and heterogeneous datasets which cannot be gathered centrally. Current distributed clustering…

Databases · Computer Science 2018-02-02 Malika Bendechache , M-Tahar Kechadi

In this paper we give a first set of communication lower bounds for distributed clustering problems, in particular, for k-center, k-median and k-means. When the input is distributed across a large number of machines and the number of…

Computational Complexity · Computer Science 2017-02-03 Qin Zhang

Clustering large datasets is a fundamental problem with a number of applications in machine learning. Data is often collected on different sites and clustering needs to be performed in a distributed manner with low communication. We would…

Data Structures and Algorithms · Computer Science 2017-02-02 Jiecao Chen , He Sun , David P. Woodruff , Qin Zhang

The problem of constrained clustering has attracted significant attention in the past decades. In this paper, we study the balanced $k$-center, $k$-median, and $k$-means clustering problems where the size of each cluster is constrained by…

Computational Geometry · Computer Science 2018-09-11 Hu Ding

The increasing popularity of cloud computing has resulted in a proliferation of data centers. Effective placement of data centers improves network performance and minimizes clients' perceived latency. The problem of determining the optimal…

Networking and Internet Architecture · Computer Science 2018-02-06 Wuqiong Luo , Wee Peng Tay , Peng Sun , Yonggang Wen

We propose a new algorithm for k-means clustering in a distributed setting, where the data is distributed across many machines, and a coordinator communicates with these machines to calculate the output clustering. Our algorithm guarantees…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-14 Tom Hess , Ron Visbord , Sivan Sabato

Clustering is one of the main tasks in exploratory data analysis and descriptive statistics where the main objective is partitioning observations in groups. Clustering has a broad range of application in varied domains like climate,…

Databases · Computer Science 2012-03-20 Saptarsi Goswami , Amlan Chakrabarti

Nowadays, with the widespread of smartphones and other portable gadgets equipped with a variety of sensors, data is ubiquitous available and the focus of machine learning has shifted from being able to infer from small training samples to…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-07-07 Radu Cristian Ionescu

The $k$-center problem is a central optimization problem with numerous applications for machine learning, data mining, and communication networks. Despite extensive study in various scenarios, it surprisingly has not been thoroughly…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-07-26 Leyla Biabani , Ami Paz

Artificial intelligence has made remarkable progress in handling complex tasks, thanks to advances in hardware acceleration and machine learning algorithms. However, to acquire more accurate outcomes and solve more complex issues,…

Machine Learning · Computer Science 2023-09-12 Mohammad Dehghani , Zahra Yazdanparast

Clustering techniques are very attractive for extracting and identifying patterns in datasets. However, their application to very large spatial datasets presents numerous challenges such as high-dimensionality data, heterogeneity, and high…

Databases · Computer Science 2018-02-27 Malika Bendechache , Nhien-An Le-Khac , M-Tahar Kechadi

We consider $K$-means clustering in networked environments (e.g., internet of things (IoT) and sensor networks) where data is inherently distributed across nodes and processing power at each node may be limited. We consider a clustering…

Machine Learning · Computer Science 2019-01-03 Soummya Kar , Brian Swenson

Clustering is a popular form of unsupervised learning for geometric data. Unfortunately, many clustering algorithms lead to cluster assignments that are hard to explain, partially because they depend on all the features of the data in a…

Machine Learning · Computer Science 2020-09-23 Sanjoy Dasgupta , Nave Frost , Michal Moshkovitz , Cyrus Rashtchian

Motivated by recent work in computational social choice, we extend the metric distortion framework to clustering problems. Given a set of $n$ agents located in an underlying metric space, our goal is to partition them into $k$ clusters,…

Computer Science and Game Theory · Computer Science 2024-02-07 Jakob Burkhardt , Ioannis Caragiannis , Karl Fehrs , Matteo Russo , Chris Schwiegelshohn , Sudarshan Shyam

The clustering problem, in its many variants, has numerous applications in operations research and computer science (e.g., in applications in bioinformatics, image processing, social network analysis, etc.). As sizes of data sets have grown…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-10-24 Sayan Bandyapadhyay , Tanmay Inamdar , Shreyas Pai , Sriram V. Pemmaraju
‹ Prev 1 2 3 10 Next ›