Related papers: RedSync : Reducing Synchronization Traffic for Dis…

Learned Gradient Compression for Distributed Deep Learning

Training deep neural networks on large datasets containing high-dimensional data requires a large amount of computation. A solution to this problem is data-parallel distributed training, where a model is replicated into several…

Machine Learning · Computer Science 2021-03-18 Lusine Abrahamyan , Yiming Chen , Giannis Bekoulis , Nikos Deligiannis

Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training

Large-scale distributed training requires significant communication bandwidth for gradient exchange that limits the scalability of multi-node training, and requires expensive high-bandwidth network infrastructure. The situation gets even…

Computer Vision and Pattern Recognition · Computer Science 2020-06-24 Yujun Lin , Song Han , Huizi Mao , Yu Wang , William J. Dally

An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems

The recent many-fold increase in the size of deep neural networks makes efficient distributed training challenging. Many proposals exploit the compressibility of the gradients and propose lossy compression techniques to speed up the…

Machine Learning · Computer Science 2021-03-19 Ahmed M. Abdelmoniem , Ahmed Elzanaty , Mohamed-Slim Alouini , Marco Canini

RS-DGC: Exploring Neighborhood Statistics for Dynamic Gradient Compression on Remote Sensing Image Interpretation

Distributed deep learning has recently been attracting more attention in remote sensing (RS) applications due to the challenges posed by the increased amount of open data that are produced daily by Earth observation programs. However, the…

Computer Vision and Pattern Recognition · Computer Science 2024-01-01 Weiying Xie , Zixuan Wang , Jitao Ma , Daixun Li , Yunsong Li

ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training

Large-scale distributed training of Deep Neural Networks (DNNs) on state-of-the-art platforms is expected to be severely communication constrained. To overcome this limitation, numerous gradient compression techniques have been proposed and…

Machine Learning · Computer Science 2021-04-23 Chia-Yu Chen , Jiamin Ni , Songtao Lu , Xiaodong Cui , Pin-Yu Chen , Xiao Sun , Naigang Wang , Swagath Venkataramani , Vijayalakshmi Srinivasan , Wei Zhang , Kailash Gopalakrishnan

Compressed Distributed Gradient Descent: Communication-Efficient Consensus over Networks

Network consensus optimization has received increasing attention in recent years and has found important applications in many scientific and engineering fields. To solve network consensus optimization problems, one of the most well-known…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-09-10 Xin Zhang , Jia Liu , Zhengyuan Zhu , Elizabeth S. Bentley

RapidGNN: Energy and Communication-Efficient Distributed Training on Large-Scale Graph Neural Networks

Graph Neural Networks (GNNs) have become popular across a diverse set of tasks in exploring structural relationships between entities. However, due to the highly connected structure of the datasets, distributed training of GNNs on…

Machine Learning · Computer Science 2025-09-08 Arefin Niam , Tevfik Kosar , M S Q Zulkar Nine

RapidGNN: Communication Efficient Large-Scale Distributed Training of Graph Neural Networks

Graph Neural Networks (GNNs) have achieved state-of-the-art (SOTA) performance in diverse domains. However, training GNNs on large-scale graphs poses significant challenges due to high memory demands and significant communication overhead…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-19 Arefin Niam , M S Q Zulkar Nine

CodedReduce: A Fast and Robust Framework for Gradient Aggregation in Distributed Learning

We focus on the commonly used synchronous Gradient Descent paradigm for large-scale distributed learning, for which there has been a growing interest to develop efficient and robust gradient aggregation strategies that overcome two key…

Machine Learning · Statistics 2021-09-30 Amirhossein Reisizadeh , Saurav Prakash , Ramtin Pedarsani , Amir Salman Avestimehr

Optimizing Network Performance for Distributed DNN Training on GPU Clusters: ImageNet/AlexNet Training in 1.5 Minutes

It is important to scale out deep neural network (DNN) training for reducing model training time. The high communication overhead is one of the major performance bottlenecks for distributed DNN training across multiple GPUs. Our…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-10-23 Peng Sun , Wansen Feng , Ruobing Han , Shengen Yan , Yonggang Wen

Sparse Binary Compression: Towards Distributed Deep Learning with minimal Communication

Currently, progressively larger deep neural networks are trained on ever growing data corpora. As this trend is only going to increase in the future, distributed training schemes are becoming increasingly relevant. A major issue in…

Machine Learning · Computer Science 2018-05-23 Felix Sattler , Simon Wiedemann , Klaus-Robert Müller , Wojciech Samek

GraVAC: Adaptive Compression for Communication-Efficient Distributed DL Training

Distributed data-parallel (DDP) training improves overall application throughput as multiple devices train on a subset of data and aggregate updates to produce a globally shared model. The periodic synchronization at each iteration incurs…

Machine Learning · Computer Science 2024-01-30 Sahil Tyagi , Martin Swany

AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training

Highly distributed training of Deep Neural Networks (DNNs) on future compute platforms (offering 100 of TeraOps/s of computational capacity) is expected to be severely communication constrained. To overcome this limitation, new gradient…

Machine Learning · Computer Science 2017-12-08 Chia-Yu Chen , Jungwook Choi , Daniel Brand , Ankur Agrawal , Wei Zhang , Kailash Gopalakrishnan

CaPGNN: Optimizing Parallel Graph Neural Network Training with Joint Caching and Resource-Aware Graph Partitioning

Graph-structured data is ubiquitous in the real world, and Graph Neural Networks (GNNs) have become increasingly popular in various fields due to their ability to process such irregular data directly. However, as data scale, GNNs become…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-10 Xianfeng Song , Yi Zou , Zheng Shi

Scalable Graph Convolutional Network Training on Distributed-Memory Systems

Graph Convolutional Networks (GCNs) are extensively utilized for deep learning on graphs. The large data sizes of graphs and their vertex features make scalable training algorithms and distributed memory systems necessary. Since the…

Machine Learning · Computer Science 2022-12-14 Gunduz Vehbi Demirci , Aparajita Haldar , Hakan Ferhatosmanoglu

Network-Density-Controlled Decentralized Parallel Stochastic Gradient Descent in Wireless Systems

This paper proposes a communication strategy for decentralized learning on wireless systems. Our discussion is based on the decentralized parallel stochastic gradient descent (D-PSGD), which is one of the state-of-the-art algorithms for…

Networking and Internet Architecture · Computer Science 2020-02-26 Koya Sato , Yasuyuki Satoh , Daisuke Sugimura

Layered SGD: A Decentralized and Synchronous SGD Algorithm for Scalable Deep Neural Network Training

Stochastic Gradient Descent (SGD) is the most popular algorithm for training deep neural networks (DNNs). As larger networks and datasets cause longer training times, training on distributed systems is common and distributed SGD variants,…

Machine Learning · Computer Science 2019-06-17 Kwangmin Yu , Thomas Flynn , Shinjae Yoo , Nicholas D'Imperio

Reducing Data Motion to Accelerate the Training of Deep Neural Networks

This paper reduces the cost of DNNs training by decreasing the amount of data movement across heterogeneous architectures composed of several GPUs and multicore CPU devices. In particular, this paper proposes an algorithm to dynamically…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-07 Sicong Zhuang , Cristiano Malossi , Marc Casas

PacTrain: Pruning and Adaptive Sparse Gradient Compression for Efficient Collective Communication in Distributed Deep Learning

Large-scale deep neural networks (DNN) exhibit excellent performance for various tasks. As DNNs and datasets grow, distributed training becomes extremely time-consuming and demands larger clusters. A main bottleneck is the resulting…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-27 Yisu Wang , Ruilong Wu , Xinjiao Li , Dirk Kutscher

On the Utility of Gradient Compression in Distributed Training Systems

A rich body of prior work has highlighted the existence of communication bottlenecks in synchronous data-parallel training. To alleviate these bottlenecks, a long line of recent work proposes gradient and model compression methods. In this…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-07-01 Saurabh Agarwal , Hongyi Wang , Shivaram Venkataraman , Dimitris Papailiopoulos