Related papers: Variance-based Gradient Compression for Efficient …

CD-SGD: Distributed Stochastic Gradient Descent with Compression and Delay Compensation

Communication overhead is the key challenge for distributed training. Gradient compression is a widely used approach to reduce communication traffic. When combining with parallel communication mechanism method like pipeline, gradient…

Machine Learning · Computer Science 2021-09-08 Enda Yu , Dezun Dong , Yemao Xu , Shuo Ouyang , Xiangke Liao

Compressed Communication for Distributed Training: Adaptive Methods and System

Communication overhead severely hinders the scalability of distributed machine learning systems. Recently, there has been a growing interest in using gradient compression to reduce the communication overhead of the distributed training.…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-19 Yuchen Zhong , Cong Xie , Shuai Zheng , Haibin Lin

Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training

Large-scale distributed training requires significant communication bandwidth for gradient exchange that limits the scalability of multi-node training, and requires expensive high-bandwidth network infrastructure. The situation gets even…

Computer Vision and Pattern Recognition · Computer Science 2020-06-24 Yujun Lin , Song Han , Huizi Mao , Yu Wang , William J. Dally

Communication-efficient Variance-reduced Stochastic Gradient Descent

We consider the problem of communication efficient distributed optimization where multiple nodes exchange important algorithm information in every iteration to solve large problems. In particular, we focus on the stochastic variance-reduced…

Machine Learning · Computer Science 2020-03-16 Hossein S. Ghadikolaei , Sindri Magnusson

Temporal Predictive Coding for Gradient Compression in Distributed Learning

This paper proposes a prediction-based gradient compression method for distributed learning with event-triggered communication. Our goal is to reduce the amount of information transmitted from the distributed agents to the parameter server…

Information Theory · Computer Science 2024-10-04 Adrian Edin , Zheng Chen , Michel Kieffer , Mikael Johansson

Learned Gradient Compression for Distributed Deep Learning

Training deep neural networks on large datasets containing high-dimensional data requires a large amount of computation. A solution to this problem is data-parallel distributed training, where a model is replicated into several…

Machine Learning · Computer Science 2021-03-18 Lusine Abrahamyan , Yiming Chen , Giannis Bekoulis , Nikos Deligiannis

ErrorCompensatedX: error compensation for variance reduced algorithms

Communication cost is one major bottleneck for the scalability for distributed learning. One approach to reduce the communication cost is to compress the gradient during communication. However, directly compressing the gradient decelerates…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-08-05 Hanlin Tang , Yao Li , Ji Liu , Ming Yan

Quantizing data for distributed learning

We consider machine learning applications that train a model by leveraging data distributed over a trusted network, where communication constraints can create a performance bottleneck. A number of recent approaches propose to overcome this…

Machine Learning · Computer Science 2021-09-10 Osama A. Hanna , Yahya H. Ezzeldin , Christina Fragouli , Suhas Diggavi

Communication-Compressed Adaptive Gradient Method for Distributed Nonconvex Optimization

Due to the explosion in the size of the training datasets, distributed learning has received growing interest in recent years. One of the major bottlenecks is the large communication cost between the central server and the local workers.…

Machine Learning · Computer Science 2022-02-25 Yujia Wang , Lu Lin , Jinghui Chen

Accordion: Adaptive Gradient Communication via Critical Learning Regime Identification

Distributed model training suffers from communication bottlenecks due to frequent model updates transmitted across compute nodes. To alleviate these bottlenecks, practitioners use gradient compression techniques like sparsification,…

Machine Learning · Computer Science 2020-11-02 Saurabh Agarwal , Hongyi Wang , Kangwook Lee , Shivaram Venkataraman , Dimitris Papailiopoulos

Distributed learning with compressed gradients

Asynchronous computation and gradient compression have emerged as two key techniques for achieving scalability in distributed optimization for large-scale machine learning. This paper presents a unified analysis framework for distributed…

Optimization and Control · Mathematics 2018-11-30 Sarit Khirirat , Hamid Reza Feyzmahdavian , Mikael Johansson

PacTrain: Pruning and Adaptive Sparse Gradient Compression for Efficient Collective Communication in Distributed Deep Learning

Large-scale deep neural networks (DNN) exhibit excellent performance for various tasks. As DNNs and datasets grow, distributed training becomes extremely time-consuming and demands larger clusters. A main bottleneck is the resulting…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-27 Yisu Wang , Ruilong Wu , Xinjiao Li , Dirk Kutscher

Step-Ahead Error Feedback for Distributed Training with Compressed Gradient

Although the distributed machine learning methods can speed up the training of large deep neural networks, the communication cost has become the non-negligible bottleneck to constrain the performance. To address this challenge, the gradient…

Machine Learning · Computer Science 2022-01-25 An Xu , Zhouyuan Huo , Heng Huang

SuperNeurons: FFT-based Gradient Sparsification in the Distributed Training of Deep Neural Networks

The performance and efficiency of distributed training of Deep Neural Networks highly depend on the performance of gradient averaging among all participating nodes, which is bounded by the communication between nodes. There are two major…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-02-10 Linnan Wang , Wei Wu , Junyu Zhang , Hang Liu , George Bosilca , Maurice Herlihy , Rodrigo Fonseca

Wyner-Ziv Gradient Compression for Federated Learning

Due to limited communication resources at the client and a massive number of model parameters, large-scale distributed learning tasks suffer from communication bottleneck. Gradient compression is an effective method to reduce communication…

Machine Learning · Computer Science 2021-11-17 Kai Liang , Huiru Zhong , Haoning Chen , Youlong Wu

On the Discrepancy between the Theoretical Analysis and Practical Implementations of Compressed Communication for Distributed Deep Learning

Compressed communication, in the form of sparsification or quantization of stochastic gradients, is employed to reduce communication costs in distributed data-parallel training of deep neural networks. However, there exists a discrepancy…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-11-20 Aritra Dutta , El Houcine Bergou , Ahmed M. Abdelmoniem , Chen-Yu Ho , Atal Narayan Sahu , Marco Canini , Panos Kalnis

Optimal Gradient Compression for Distributed and Federated Learning

Communicating information, like gradient vectors, between computing nodes in distributed and federated learning is typically an unavoidable burden, resulting in scalability issues. Indeed, communication might be slow and costly. Recent…

Machine Learning · Computer Science 2020-10-08 Alyazeed Albasyoni , Mher Safaryan , Laurent Condat , Peter Richtárik

On the Utility of Gradient Compression in Distributed Training Systems

A rich body of prior work has highlighted the existence of communication bottlenecks in synchronous data-parallel training. To alleviate these bottlenecks, a long line of recent work proposes gradient and model compression methods. In this…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-07-01 Saurabh Agarwal , Hongyi Wang , Shivaram Venkataraman , Dimitris Papailiopoulos

Toward Communication Efficient Adaptive Gradient Method

In recent years, distributed optimization is proven to be an effective approach to accelerate training of large scale machine learning models such as deep neural networks. With the increasing computation power of GPUs, the bottleneck of…

Machine Learning · Computer Science 2021-09-14 Xiangyi Chen , Xiaoyun Li , Ping Li

Sparse Communication for Training Deep Networks

Synchronous stochastic gradient descent (SGD) is the most common method used for distributed training of deep learning models. In this algorithm, each worker shares its local gradients with others and updates the parameters using the…

Machine Learning · Computer Science 2020-09-22 Negar Foroutan Eghlidi , Martin Jaggi