Related papers: Communication-Optimal Distributed Clustering

Distributed Graph Clustering by Load Balancing

Graph clustering is a fundamental computational problem with a number of applications in algorithm design, machine learning, data mining, and analysis of social networks. Over the past decades, researchers have proposed a number of…

Data Structures and Algorithms · Computer Science 2019-04-12 He Sun , Luca Zanetti

Communication-Optimal Distributed Dynamic Graph Clustering

We consider the problem of clustering graph nodes over large-scale dynamic graphs, such as citation networks, images and web networks, when graph updates such as node/edge insertions/deletions are observed distributively. We propose…

Data Structures and Algorithms · Computer Science 2018-11-16 Chun Jiang Zhu , Tan Zhu , Kam-Yiu Lam , Song Han , Jinbo Bi

When Distributed Computation is Communication Expensive

We consider a number of fundamental statistical and graph problems in the message-passing model, where we have $k$ machines (sites), each holding a piece of data, and the machines want to jointly solve a problem defined on the union of the…

Data Structures and Algorithms · Computer Science 2013-07-29 David P. Woodruff , Qin Zhang

Distributed Graph Clustering and Sparsification

Graph clustering is a fundamental computational problem with a number of applications in algorithm design, machine learning, data mining, and analysis of social networks. Over the past decades, researchers have proposed a number of…

Data Structures and Algorithms · Computer Science 2017-11-06 He Sun , Luca Zanetti

Fast communication-efficient spectral clustering over distributed data

The last decades have seen a surge of interests in distributed computing thanks to advances in clustered computing and big data technology. Existing distributed algorithms typically assume {\it all the data are already in one place}, and…

Machine Learning · Computer Science 2019-05-07 Donghui Yan , Yingjie Wang , Jin Wang , Guodong Wu , Honggang Wang

Communication-Efficient Distributed Graph Clustering and Sparsification under Duplication Models

In this paper, we consider the problem of clustering graph nodes and sparsifying graph edges over distributed graphs, when graph edges with possibly edge duplicates are observed at physically remote sites. Although edge duplicates across…

Data Structures and Algorithms · Computer Science 2023-02-21 Chun Jiang Zhu

Clustering based on Random Graph Model embedding Vertex Features

Large datasets with interactions between objects are common to numerous scientific fields (i.e. social science, internet, biology...). The interactions naturally define a graph and a common way to explore or summarize such dataset is graph…

Applications · Statistics 2009-10-13 Hugo Zanghi , Stevenn Volant , Christophe Ambroise

$k$-Center Clustering in Distributed Models

The $k$-center problem is a central optimization problem with numerous applications for machine learning, data mining, and communication networks. Despite extensive study in various scenarios, it surprisingly has not been thoroughly…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-07-26 Leyla Biabani , Ami Paz

On a Distributed Approach for Density-based Clustering

Efficient extraction of useful knowledge from these data is still a challenge, mainly when the data is distributed, heterogeneous and of different quality depending on its corresponding local infrastructure. To reduce the overhead cost,…

Databases · Computer Science 2017-04-17 Nhien-An Le-Khac , M-Tahar Kechadi

Distributed Algorithms for Finding Local Clusters Using Heat Kernel Pagerank

A distributed algorithm performs local computations on pieces of input and communicates the results through given communication links. When processing a massive graph in a distributed algorithm, local outputs must be configured as a…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-12-06 Fan Chung , Olivia Simpson

Scaling-up Distributed Processing of Data Streams for Machine Learning

Emerging applications of machine learning in numerous areas involve continuous gathering of and learning from streams of data. Real-time incorporation of streaming data into the learned models is essential for improved inference in these…

Machine Learning · Computer Science 2020-12-01 Matthew Nokleby , Haroon Raja , Waheed U. Bajwa

Clustering multilayer graphs with missing nodes

Relationship between agents can be conveniently represented by graphs. When these relationships have different modalities, they are better modelled by multilayer graphs where each layer is associated with one modality. Such graphs arise…

Machine Learning · Statistics 2021-03-05 Guillaume Braun , Hemant Tyagi , Christophe Biernacki

On Maintaining Linear Convergence of Distributed Learning and Optimization under Limited Communication

In distributed optimization and machine learning, multiple nodes coordinate to solve large problems. To do this, the nodes need to compress important algorithm information to bits so that it can be communicated over a digital channel. The…

Optimization and Control · Mathematics 2020-12-02 Sindri Magnússon , Hossein Shokri-Ghadikolaei , Na Li

Robust Communication-Optimal Distributed Clustering Algorithms

In this work, we study the $k$-median and $k$-means clustering problems when the data is distributed across many servers and can contain outliers. While there has been a lot of work on these problems for worst-case instances, we focus on…

Data Structures and Algorithms · Computer Science 2019-03-08 Pranjal Awasthi , Ainesh Bakshi , Maria-Florina Balcan , Colin White , David Woodruff

Distributed Partial Clustering

Recent years have witnessed an increasing popularity of algorithm design for distributed data, largely due to the fact that massive datasets are often collected and stored in different locations. In the distributed setting communication…

Data Structures and Algorithms · Computer Science 2017-06-06 Sudipto Guha , Yi Li , Qin Zhang

Communication-Efficient and Exact Clustering Distributed Streaming Data

A widely used approach to clustering a single data stream is the two-phased approach in which the online phase creates and maintains micro-clusters while the off-line phase generates the macro-clustering from the micro-clusters. We use this…

Databases · Computer Science 2012-09-20 Dang-Hoan Tran

Federated Optimization:Distributed Optimization Beyond the Datacenter

We introduce a new and increasingly relevant setting for distributed optimization in machine learning, where the data defining the optimization are distributed (unevenly) over an extremely large number of \nodes, but the goal remains to…

Machine Learning · Computer Science 2015-11-12 Jakub Konečný , Brendan McMahan , Daniel Ramage

Scaling Graph Clustering with Distributed Sketches

The unsupervised learning of community structure, in particular the partitioning vertices into clusters or communities, is a canonical and well-studied problem in exploratory graph analysis. However, like most graph analyses the…

Machine Learning · Computer Science 2020-07-27 Benjamin W. Priest , Alec Dunton , Geoffrey Sanders

Rethinking Personalized Federated Learning with Clustering-based Dynamic Graph Propagation

Most existing personalized federated learning approaches are based on intricate designs, which often require complex implementation and tuning. In order to address this limitation, we propose a simple yet effective personalized federated…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-30 Jiaqi Wang , Yuzhong Chen , Yuhang Wu , Mahashweta Das , Hao Yang , Fenglong Ma

A core-set approach for distributed quadratic programming in big-data classification

A new challenge for learning algorithms in cyber-physical network systems is the distributed solution of big-data classification problems, i.e., problems in which both the number of training samples and their dimension is high. Motivated by…

Optimization and Control · Mathematics 2017-02-16 Giuseppe Notarstefano