Related papers: Learned Gradient Compression for Distributed Deep …

Variance-based Gradient Compression for Efficient Distributed Deep Learning

Due to the substantial computational cost, training state-of-the-art deep neural networks for large-scale datasets often requires distributed training using multiple computation workers. However, by nature, workers need to frequently…

Machine Learning · Computer Science 2018-02-21 Yusuke Tsuzuku , Hiroto Imachi , Takuya Akiba

Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training

Large-scale distributed training requires significant communication bandwidth for gradient exchange that limits the scalability of multi-node training, and requires expensive high-bandwidth network infrastructure. The situation gets even…

Computer Vision and Pattern Recognition · Computer Science 2020-06-24 Yujun Lin , Song Han , Huizi Mao , Yu Wang , William J. Dally

SuperNeurons: FFT-based Gradient Sparsification in the Distributed Training of Deep Neural Networks

The performance and efficiency of distributed training of Deep Neural Networks highly depend on the performance of gradient averaging among all participating nodes, which is bounded by the communication between nodes. There are two major…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-02-10 Linnan Wang , Wei Wu , Junyu Zhang , Hang Liu , George Bosilca , Maurice Herlihy , Rodrigo Fonseca

An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems

The recent many-fold increase in the size of deep neural networks makes efficient distributed training challenging. Many proposals exploit the compressibility of the gradients and propose lossy compression techniques to speed up the…

Machine Learning · Computer Science 2021-03-19 Ahmed M. Abdelmoniem , Ahmed Elzanaty , Mohamed-Slim Alouini , Marco Canini

RedSync : Reducing Synchronization Traffic for Distributed Deep Learning

Data parallelism has become a dominant method to scale Deep Neural Network (DNN) training across multiple nodes. Since synchronizing a large number of gradients of the local model can be a bottleneck for large-scale distributed training,…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-07-23 Jiarui Fang , Haohuan Fu , Guangwen Yang , Cho-Jui Hsieh

Compressed Communication for Distributed Training: Adaptive Methods and System

Communication overhead severely hinders the scalability of distributed machine learning systems. Recently, there has been a growing interest in using gradient compression to reduce the communication overhead of the distributed training.…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-19 Yuchen Zhong , Cong Xie , Shuai Zheng , Haibin Lin

Sparse Binary Compression: Towards Distributed Deep Learning with minimal Communication

Currently, progressively larger deep neural networks are trained on ever growing data corpora. As this trend is only going to increase in the future, distributed training schemes are becoming increasingly relevant. A major issue in…

Machine Learning · Computer Science 2018-05-23 Felix Sattler , Simon Wiedemann , Klaus-Robert Müller , Wojciech Samek

Distributed learning with compressed gradients

Asynchronous computation and gradient compression have emerged as two key techniques for achieving scalability in distributed optimization for large-scale machine learning. This paper presents a unified analysis framework for distributed…

Optimization and Control · Mathematics 2018-11-30 Sarit Khirirat , Hamid Reza Feyzmahdavian , Mikael Johansson

ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training

Large-scale distributed training of Deep Neural Networks (DNNs) on state-of-the-art platforms is expected to be severely communication constrained. To overcome this limitation, numerous gradient compression techniques have been proposed and…

Machine Learning · Computer Science 2021-04-23 Chia-Yu Chen , Jiamin Ni , Songtao Lu , Xiaodong Cui , Pin-Yu Chen , Xiao Sun , Naigang Wang , Swagath Venkataramani , Vijayalakshmi Srinivasan , Wei Zhang , Kailash Gopalakrishnan

CD-SGD: Distributed Stochastic Gradient Descent with Compression and Delay Compensation

Communication overhead is the key challenge for distributed training. Gradient compression is a widely used approach to reduce communication traffic. When combining with parallel communication mechanism method like pipeline, gradient…

Machine Learning · Computer Science 2021-09-08 Enda Yu , Dezun Dong , Yemao Xu , Shuo Ouyang , Xiangke Liao

PowerGossip: Practical Low-Rank Communication Compression in Decentralized Deep Learning

Lossy gradient compression has become a practical tool to overcome the communication bottleneck in centrally coordinated distributed training of machine learning models. However, algorithms for decentralized training with compressed…

Machine Learning · Computer Science 2020-10-20 Thijs Vogels , Sai Praneeth Karimireddy , Martin Jaggi

Toward Efficient Federated Learning in Multi-Channeled Mobile Edge Network with Layerd Gradient Compression

A fundamental issue for federated learning (FL) is how to achieve optimal model performance under highly dynamic communication environments. This issue can be alleviated by the fact that modern edge devices usually can connect to the edge…

Machine Learning · Computer Science 2021-09-21 Haizhou Du , Xiaojie Feng , Qiao Xiang , Haoyu Liu

RS-DGC: Exploring Neighborhood Statistics for Dynamic Gradient Compression on Remote Sensing Image Interpretation

Distributed deep learning has recently been attracting more attention in remote sensing (RS) applications due to the challenges posed by the increased amount of open data that are produced daily by Earth observation programs. However, the…

Computer Vision and Pattern Recognition · Computer Science 2024-01-01 Weiying Xie , Zixuan Wang , Jitao Ma , Daixun Li , Yunsong Li

Sparse Communication for Training Deep Networks

Synchronous stochastic gradient descent (SGD) is the most common method used for distributed training of deep learning models. In this algorithm, each worker shares its local gradients with others and updates the parameters using the…

Machine Learning · Computer Science 2020-09-22 Negar Foroutan Eghlidi , Martin Jaggi

TAGC: Optimizing Gradient Communication in Distributed Transformer Training

The increasing complexity of large language models (LLMs) necessitates efficient training strategies to mitigate the high computational costs associated with distributed training. A significant bottleneck in this process is gradient…

Machine Learning · Computer Science 2025-04-09 Igor Polyakov , Alexey Dukhanov , Egor Spirin

DNN gradient lossless compression: Can GenNorm be the answer?

In this paper, the problem of optimal gradient lossless compression in Deep Neural Network (DNN) training is considered. Gradient compression is relevant in many distributed DNN training scenarios, including the recently popular federated…

Machine Learning · Computer Science 2021-11-16 Zhong-Jing Chen , Eduin E. Hernandez , Yu-Chih Huang , Stefano Rini

Intrinisic Gradient Compression for Federated Learning

Federated learning is a rapidly-growing area of research which enables a large number of clients to jointly train a machine learning model on privately-held data. One of the largest barriers to wider adoption of federated learning is the…

Machine Learning · Computer Science 2021-12-07 Luke Melas-Kyriazi , Franklyn Wang

On the Discrepancy between the Theoretical Analysis and Practical Implementations of Compressed Communication for Distributed Deep Learning

Compressed communication, in the form of sparsification or quantization of stochastic gradients, is employed to reduce communication costs in distributed data-parallel training of deep neural networks. However, there exists a discrepancy…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-11-20 Aritra Dutta , El Houcine Bergou , Ahmed M. Abdelmoniem , Chen-Yu Ho , Atal Narayan Sahu , Marco Canini , Panos Kalnis

Optimal Gradient Compression for Distributed and Federated Learning

Communicating information, like gradient vectors, between computing nodes in distributed and federated learning is typically an unavoidable burden, resulting in scalability issues. Indeed, communication might be slow and costly. Recent…

Machine Learning · Computer Science 2020-10-08 Alyazeed Albasyoni , Mher Safaryan , Laurent Condat , Peter Richtárik

Adaptive Consensus Gradients Aggregation for Scaled Distributed Training

Distributed machine learning has recently become a critical paradigm for training large models on vast datasets. We examine the stochastic optimization problem for deep learning within synchronous parallel computing environments under…

Machine Learning · Computer Science 2024-11-07 Yoni Choukroun , Shlomi Azoulay , Pavel Kisilev