Related papers: On Distributed Adaptive Optimization with Gradient…

Communication-Compressed Adaptive Gradient Method for Distributed Nonconvex Optimization

Due to the explosion in the size of the training datasets, distributed learning has received growing interest in recent years. One of the major bottlenecks is the large communication cost between the central server and the local workers.…

Machine Learning · Computer Science 2022-02-25 Yujia Wang , Lu Lin , Jinghui Chen

Compressed Communication for Distributed Training: Adaptive Methods and System

Communication overhead severely hinders the scalability of distributed machine learning systems. Recently, there has been a growing interest in using gradient compression to reduce the communication overhead of the distributed training.…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-19 Yuchen Zhong , Cong Xie , Shuai Zheng , Haibin Lin

Adaptive Top-K in SGD for Communication-Efficient Distributed Learning

Distributed stochastic gradient descent (SGD) with gradient compression has become a popular communication-efficient solution for accelerating distributed learning. One commonly used method for gradient compression is Top-K sparsification,…

Machine Learning · Computer Science 2023-09-12 Mengzhe Ruan , Guangfeng Yan , Yuanzhang Xiao , Linqi Song , Weitao Xu

On the Convergence of Decentralized Adaptive Gradient Methods

Adaptive gradient methods including Adam, AdaGrad, and their variants have been very successful for training deep learning models, such as neural networks. Meanwhile, given the need for distributed computing, distributed optimization…

Machine Learning · Computer Science 2021-09-08 Xiangyi Chen , Belhal Karimi , Weijie Zhao , Ping Li

Variance-based Gradient Compression for Efficient Distributed Deep Learning

Due to the substantial computational cost, training state-of-the-art deep neural networks for large-scale datasets often requires distributed training using multiple computation workers. However, by nature, workers need to frequently…

Machine Learning · Computer Science 2018-02-21 Yusuke Tsuzuku , Hiroto Imachi , Takuya Akiba

Compressed Distributed Gradient Descent: Communication-Efficient Consensus over Networks

Network consensus optimization has received increasing attention in recent years and has found important applications in many scientific and engineering fields. To solve network consensus optimization problems, one of the most well-known…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-09-10 Xin Zhang , Jia Liu , Zhengyuan Zhu , Elizabeth S. Bentley

Communication-Efficient Distributed Learning via Sparse and Adaptive Stochastic Gradient

Gradient-based optimization methods implemented on distributed computing architectures are increasingly used to tackle large-scale machine learning applications. A key bottleneck in such distributed systems is the high communication…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-06-11 Xiaoge Deng , Dongsheng Li , Tao Sun , Xicheng Lu

Distributed learning with compressed gradients

Asynchronous computation and gradient compression have emerged as two key techniques for achieving scalability in distributed optimization for large-scale machine learning. This paper presents a unified analysis framework for distributed…

Optimization and Control · Mathematics 2018-11-30 Sarit Khirirat , Hamid Reza Feyzmahdavian , Mikael Johansson

Communication-Efficient Distributed SGD with Compressed Sensing

We consider large scale distributed optimization over a set of edge devices connected to a central server, where the limited communication bandwidth between the server and edge devices imposes a significant bottleneck for the optimization…

Optimization and Control · Mathematics 2021-12-28 Yujie Tang , Vikram Ramanathan , Junshan Zhang , Na Li

Communication-Efficient Federated Learning via Quantized Compressed Sensing

In this paper, we present a communication-efficient federated learning framework inspired by quantized compressed sensing. The presented framework consists of gradient compression for wireless devices and gradient reconstruction for a…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-12-01 Yongjeong Oh , Namyoon Lee , Yo-Seb Jeon , H. Vincent Poor

Distributed Methods with Absolute Compression and Error Compensation

Distributed optimization methods are often applied to solving huge-scale problems like training neural networks with millions and even billions of parameters. In such applications, communicating full vectors, e.g., (stochastic) gradients,…

Optimization and Control · Mathematics 2022-05-31 Marina Danilova , Eduard Gorbunov

Adaptive Compression for Communication-Efficient Distributed Training

We propose Adaptive Compressed Gradient Descent (AdaCGD) - a novel optimization algorithm for communication-efficient training of supervised machine learning models with adaptive compression level. Our approach is inspired by the recently…

Machine Learning · Computer Science 2022-11-02 Maksim Makarenko , Elnur Gasanov , Rustem Islamov , Abdurakhmon Sadiev , Peter Richtarik

Acceleration for Compressed Gradient Descent in Distributed and Federated Optimization

Due to the high communication cost in distributed and federated learning problems, methods relying on compression of communicated messages are becoming increasingly popular. While in other contexts the best performing gradient-type methods…

Optimization and Control · Mathematics 2020-06-29 Zhize Li , Dmitry Kovalev , Xun Qian , Peter Richtárik

Toward Communication Efficient Adaptive Gradient Method

In recent years, distributed optimization is proven to be an effective approach to accelerate training of large scale machine learning models such as deep neural networks. With the increasing computation power of GPUs, the bottleneck of…

Machine Learning · Computer Science 2021-09-14 Xiangyi Chen , Xiaoyun Li , Ping Li

Accordion: Adaptive Gradient Communication via Critical Learning Regime Identification

Distributed model training suffers from communication bottlenecks due to frequent model updates transmitted across compute nodes. To alleviate these bottlenecks, practitioners use gradient compression techniques like sparsification,…

Machine Learning · Computer Science 2020-11-02 Saurabh Agarwal , Hongyi Wang , Kangwook Lee , Shivaram Venkataraman , Dimitris Papailiopoulos

CD-SGD: Distributed Stochastic Gradient Descent with Compression and Delay Compensation

Communication overhead is the key challenge for distributed training. Gradient compression is a widely used approach to reduce communication traffic. When combining with parallel communication mechanism method like pipeline, gradient…

Machine Learning · Computer Science 2021-09-08 Enda Yu , Dezun Dong , Yemao Xu , Shuo Ouyang , Xiangke Liao

PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization

We study gradient compression methods to alleviate the communication bottleneck in data-parallel distributed optimization. Despite the significant attention received, current compression schemes either do not scale well or fail to achieve…

Machine Learning · Computer Science 2020-02-19 Thijs Vogels , Sai Praneeth Karimireddy , Martin Jaggi

CEDAS: A Compressed Decentralized Stochastic Gradient Method with Improved Convergence

In this paper, we consider solving the distributed optimization problem over a multi-agent network under the communication restricted setting. We study a compressed decentralized stochastic gradient method, termed ``compressed exact…

Optimization and Control · Mathematics 2024-10-01 Kun Huang , Shi Pu

Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees

Communication compression is a crucial technique for modern distributed learning systems to alleviate their communication bottlenecks over slower networks. Despite recent intensive studies of gradient compression for data parallel-style…

Machine Learning · Computer Science 2023-03-08 Jue Wang , Binhang Yuan , Luka Rimanic , Yongjun He , Tri Dao , Beidi Chen , Christopher Re , Ce Zhang

Evaluation and Optimization of Gradient Compression for Distributed Deep Learning

To accelerate distributed training, many gradient compression methods have been proposed to alleviate the communication bottleneck in synchronous stochastic gradient descent (S-SGD), but their efficacy in real-world applications still…

Machine Learning · Computer Science 2023-06-16 Lin Zhang , Longteng Zhang , Shaohuai Shi , Xiaowen Chu , Bo Li