Related papers: Domain-specific Communication Optimization for Dis…

LCP: A Low-Communication Parallelization Method for Fast Neural Network Inference in Image Recognition

Deep neural networks (DNNs) have inspired new studies in myriad edge applications with robots, autonomous agents, and Internet-of-things (IoT) devices. However, performing inference of DNNs in the edge is still a severe challenge, mainly…

Signal Processing · Electrical Eng. & Systems 2020-11-18 Ramyad Hadidi , Bahar Asgari , Jiashen Cao , Younmin Bae , Da Eun Shim , Hyojong Kim , Sung-Kyu Lim , Michael S. Ryoo , Hyesoon Kim

Communication-Efficient Distributed Deep Learning: A Comprehensive Survey

Distributed deep learning (DL) has become prevalent in recent years to reduce training time by leveraging multiple computing devices (e.g., GPUs/TPUs) due to larger models and datasets. However, system scalability is limited by…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-09-04 Zhenheng Tang , Shaohuai Shi , Wei Wang , Bo Li , Xiaowen Chu

DBLP: Phase-Aware Bounded-Loss Transport for Burst-Resilient Distributed ML Training

Distributed machine learning (ML) training has become a necessity with the prevalence of billion to trillion-parameter-scale models. While prior work has improved training efficiency from the ML perspective at the application layer, it…

Machine Learning · Computer Science 2026-05-05 Zechen Ma , Zixi Qu , Jinyan Yi , David Lin , Yashar Ganjali

DreamDDP: Accelerating Data Parallel Distributed LLM Training with Layer-wise Scheduled Partial Synchronization

The growth of large language models (LLMs) increases challenges of accelerating distributed training across multiple GPUs in different data centers. Moreover, concerns about data privacy and data exhaustion have heightened interest in…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-02-18 Zhenheng Tang , Zichen Tang , Junlin Huang , Xinglin Pan , Rudan Yan , Yuxin Wang , Amelie Chi Zhou , Shaohuai Shi , Xiaowen Chu , Bo Li

Communication-Efficient Network-Distributed Optimization with Differential-Coded Compressors

Network-distributed optimization has attracted significant attention in recent years due to its ever-increasing applications. However, the classic decentralized gradient descent (DGD) algorithm is communication-inefficient for large-scale…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-12-09 Xin Zhang , Jia Liu , Zhengyuan Zhu , Elizabeth S. Bentley

A Communication-Efficient Distributed Gradient Clipping Algorithm for Training Deep Neural Networks

In distributed training of deep neural networks, people usually run Stochastic Gradient Descent (SGD) or its variants on each machine and communicate with other machines periodically. However, SGD might converge slowly in training some deep…

Machine Learning · Computer Science 2022-10-14 Mingrui Liu , Zhenxun Zhuang , Yunwei Lei , Chunyang Liao

Accelerating Decentralized Optimization via Overlapping Local Steps

Decentralized optimization has emerged as a critical paradigm for distributed learning, enabling scalable training while preserving data privacy through peer-to-peer collaboration. However, existing methods often suffer from communication…

Machine Learning · Computer Science 2026-01-06 Yijie Zhou , Shi Pu

DynaComm: Accelerating Distributed CNN Training between Edges and Clouds through Dynamic Communication Scheduling

To reduce uploading bandwidth and address privacy concerns, deep learning at the network edge has been an emerging topic. Typically, edge devices collaboratively train a shared model using real-time generated data through the Parameter…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-10-11 Shangming Cai , Dongsheng Wang , Haixia Wang , Yongqiang Lyu , Guangquan Xu , Xi Zheng , Athanasios V. Vasilakos

A Novel Coded Computing Approach for Distributed Multi-Task Learning

Distributed multi-task learning (DMTL) effectively improves model generalization performance through the collaborative training of multiple related models. However, in large-scale learning scenarios, communication bottlenecks severely limit…

Information Theory · Computer Science 2025-07-25 Minquan Cheng , Yongkang Wang , Lingyu Zhang , Youlong Wu

Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training

Large-scale distributed training requires significant communication bandwidth for gradient exchange that limits the scalability of multi-node training, and requires expensive high-bandwidth network infrastructure. The situation gets even…

Computer Vision and Pattern Recognition · Computer Science 2020-06-24 Yujun Lin , Song Han , Huizi Mao , Yu Wang , William J. Dally

Communication optimization strategies for distributed deep neural network training: A survey

Recent trends in high-performance computing and deep learning have led to the proliferation of studies on large-scale deep neural network training. However, the frequent communication requirements among computation nodes drastically slows…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-01-27 Shuo Ouyang , Dezun Dong , Yemao Xu , Liquan Xiao

Efficient Decentralized Deep Learning by Dynamic Model Averaging

We propose an efficient protocol for decentralized training of deep neural networks from distributed data sources. The proposed protocol allows to handle different phases of model training equally well and to quickly adapt to concept…

Machine Learning · Computer Science 2018-11-14 Michael Kamp , Linara Adilova , Joachim Sicking , Fabian Hüger , Peter Schlicht , Tim Wirtz , Stefan Wrobel

A Quantitative Survey of Communication Optimizations in Distributed Deep Learning

Nowadays, large and complex deep learning (DL) models are increasingly trained in a distributed manner across multiple worker machines, in which extensive communications between workers pose serious scaling problems. In this article, we…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-11-10 Shaohuai Shi , Zhenheng Tang , Xiaowen Chu , Chengjian Liu , Wei Wang , Bo Li

Layered SGD: A Decentralized and Synchronous SGD Algorithm for Scalable Deep Neural Network Training

Stochastic Gradient Descent (SGD) is the most popular algorithm for training deep neural networks (DNNs). As larger networks and datasets cause longer training times, training on distributed systems is common and distributed SGD variants,…

Machine Learning · Computer Science 2019-06-17 Kwangmin Yu , Thomas Flynn , Shinjae Yoo , Nicholas D'Imperio

Robust and Communication-Efficient Collaborative Learning

We consider a decentralized learning problem, where a set of computing nodes aim at solving a non-convex optimization problem collaboratively. It is well-known that decentralized optimization schemes face two major system bottlenecks:…

Machine Learning · Computer Science 2019-11-04 Amirhossein Reisizadeh , Hossein Taheri , Aryan Mokhtari , Hamed Hassani , Ramtin Pedarsani

Learned Gradient Compression for Distributed Deep Learning

Training deep neural networks on large datasets containing high-dimensional data requires a large amount of computation. A solution to this problem is data-parallel distributed training, where a model is replicated into several…

Machine Learning · Computer Science 2021-03-18 Lusine Abrahamyan , Yiming Chen , Giannis Bekoulis , Nikos Deligiannis

Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey

With the rapid growth in the volume of data sets, models, and devices in the domain of deep learning, there is increasing attention on large-scale distributed deep learning. In contrast to traditional distributed deep learning, the…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-10 Feng Liang , Zhen Zhang , Haifeng Lu , Victor C. M. Leung , Yanyi Guo , Xiping Hu

Compression and Acceleration of Neural Networks for Communications

Deep learning (DL) has achieved great success in signal processing and communications and has become a promising technology for future wireless communications. Existing works mainly focus on exploiting DL to improve the performance of…

Information Theory · Computer Science 2024-10-30 Jiajia Guo , Jinghe Wang , Chao-Kai Wen , Shi Jin , Geoffrey Ye Li

Communication Optimization for Distributed Training: Architecture, Advances, and Opportunities

The past few years have witnessed the flourishing of large-scale deep neural network models with ever-growing parameter numbers. Training such large-scale models typically requires massive memory and computing resources, necessitating…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-08-30 Yunze Wei , Tianshuo Hu , Cong Liang , Yong Cui

HeterPS: Distributed Deep Learning With Reinforcement Learning Based Scheduling in Heterogeneous Environments

Deep neural networks (DNNs) exploit many layers and a large number of parameters to achieve excellent performance. The training process of DNN models generally handles large-scale input data with many sparse features, which incurs high…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-06-08 Ji Liu , Zhihua Wu , Dianhai Yu , Yanjun Ma , Danlei Feng , Minxu Zhang , Xinxuan Wu , Xuefeng Yao , Dejing Dou