Related papers: Distributed Methods with Absolute Compression and …

Error Compensated Loopless SVRG, Quartz, and SDCA for Distributed Optimization

The communication of gradients is a key bottleneck in distributed training of large scale machine learning models. In order to reduce the communication cost, gradient compression (e.g., sparsification and quantization) and error…

Optimization and Control · Mathematics 2021-09-22 Xun Qian , Hanze Dong , Peter Richtárik , Tong Zhang

Linearly Converging Error Compensated SGD

In this paper, we propose a unified analysis of variants of distributed SGD with arbitrary compressions and delayed updates. Our framework is general enough to cover different variants of quantized SGD, Error-Compensated SGD (EC-SGD) and…

Optimization and Control · Mathematics 2020-10-26 Eduard Gorbunov , Dmitry Kovalev , Dmitry Makarenko , Peter Richtárik

Distributed and Stochastic Optimization Methods with Gradient Compression and Local Steps

In this thesis, we propose new theoretical frameworks for the analysis of stochastic and distributed methods with error compensation and local updates. Using these frameworks, we develop more than 20 new optimization methods, including the…

Optimization and Control · Mathematics 2021-12-21 Eduard Gorbunov

Communication-Efficient Distributed Learning with Local Immediate Error Compensation

Gradient compression with error compensation has attracted significant attention with the target of reducing the heavy communication overhead in distributed learning. However, existing compression methods either perform only unidirectional…

Machine Learning · Computer Science 2024-02-20 Yifei Cheng , Li Shen , Linli Xu , Xun Qian , Shiwei Wu , Yiming Zhou , Tie Zhang , Dacheng Tao , Enhong Chen

Error Compensated Distributed SGD Can Be Accelerated

Gradient compression is a recent and increasingly popular technique for reducing the communication cost in distributed training of large-scale machine learning models. In this work we focus on developing efficient distributed methods that…

Optimization and Control · Mathematics 2020-10-02 Xun Qian , Peter Richtárik , Tong Zhang

On Communication Compression for Distributed Optimization on Heterogeneous Data

Lossy gradient compression, with either unbiased or biased compressors, has become a key tool to avoid the communication bottleneck in centrally coordinated distributed training of machine learning models. We analyze the performance of two…

Machine Learning · Computer Science 2020-12-23 Sebastian U. Stich

On Biased Compression for Distributed Learning

In the last few years, various communication compression techniques have emerged as an indispensable tool helping to alleviate the communication bottleneck in distributed learning. However, despite the fact biased compressors often show…

Machine Learning · Computer Science 2024-01-17 Aleksandr Beznosikov , Samuel Horváth , Peter Richtárik , Mher Safaryan

Communication-Compressed Adaptive Gradient Method for Distributed Nonconvex Optimization

Due to the explosion in the size of the training datasets, distributed learning has received growing interest in recent years. One of the major bottlenecks is the large communication cost between the central server and the local workers.…

Machine Learning · Computer Science 2022-02-25 Yujia Wang , Lu Lin , Jinghui Chen

Communication-Efficient Distributed SGD with Compressed Sensing

We consider large scale distributed optimization over a set of edge devices connected to a central server, where the limited communication bandwidth between the server and edge devices imposes a significant bottleneck for the optimization…

Optimization and Control · Mathematics 2021-12-28 Yujie Tang , Vikram Ramanathan , Junshan Zhang , Na Li

Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization

Large-scale distributed optimization is of great importance in various applications. For data-parallel based distributed learning, the inter-node gradient communication often becomes the performance bottleneck. In this paper, we propose the…

Computer Vision and Pattern Recognition · Computer Science 2018-06-22 Jiaxiang Wu , Weidong Huang , Junzhou Huang , Tong Zhang

Error Feedback Fixes SignSGD and other Gradient Compression Schemes

Sign-based algorithms (e.g. signSGD) have been proposed as a biased gradient compression technique to alleviate the communication bottleneck in training large neural networks across multiple workers. We show simple convex counter-examples…

Machine Learning · Computer Science 2019-05-30 Sai Praneeth Karimireddy , Quentin Rebjock , Sebastian U. Stich , Martin Jaggi

Accelerated Sparsified SGD with Error Feedback

A stochastic gradient method for synchronous distributed optimization is studied. For reducing communication cost, we particularly focus on utilization of compression of communicated gradients. Several work has shown that {\it{sparsified}}…

Optimization and Control · Mathematics 2020-06-22 Tomoya Murata , Taiji Suzuki

Acceleration for Compressed Gradient Descent in Distributed and Federated Optimization

Due to the high communication cost in distributed and federated learning problems, methods relying on compression of communicated messages are becoming increasingly popular. While in other contexts the best performing gradient-type methods…

Optimization and Control · Mathematics 2020-06-29 Zhize Li , Dmitry Kovalev , Xun Qian , Peter Richtárik

DoubleSqueeze: Parallel Stochastic Gradient Descent with Double-Pass Error-Compensated Compression

A standard approach in large scale machine learning is distributed stochastic gradient training, which requires the computation of aggregated stochastic gradients over multiple nodes on a network. Communication is a major bottleneck in such…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-03-24 Hanlin Tang , Xiangru Lian , Chen Yu , Tong Zhang , Ji Liu

Trustworthy Efficient Communication for Distributed Learning using LQ-SGD Algorithm

We propose LQ-SGD (Low-Rank Quantized Stochastic Gradient Descent), an efficient communication gradient compression algorithm designed for distributed training. LQ-SGD further develops on the basis of PowerSGD by incorporating the low-rank…

Machine Learning · Computer Science 2025-06-24 Hongyang Li , Lincen Bai , Caesar Wu , Mohammed Chadli , Said Mammar , Pascal Bouvry

EControl: Fast Distributed Optimization with Compression and Error Control

Modern distributed training relies heavily on communication compression to reduce the communication overhead. In this work, we study algorithms employing a popular class of contractive compressors in order to reduce communication overhead.…

Optimization and Control · Mathematics 2023-11-13 Yuan Gao , Rustem Islamov , Sebastian Stich

On Distributed Adaptive Optimization with Gradient Compression

We study COMP-AMS, a distributed optimization framework based on gradient averaging and adaptive AMSGrad algorithm. Gradient compression with error feedback is applied to reduce the communication cost in the gradient transmission process.…

Machine Learning · Statistics 2022-05-12 Xiaoyun Li , Belhal Karimi , Ping Li

A Hybrid-Order Distributed SGD Method for Non-Convex Optimization to Balance Communication Overhead, Computational Complexity, and Convergence Rate

In this paper, we propose a method of distributed stochastic gradient descent (SGD), with low communication load and computational complexity, and still fast convergence. To reduce the communication load, at each iteration of the algorithm,…

Machine Learning · Computer Science 2020-03-30 Naeimeh Omidvar , Mohammad Ali Maddah-Ali , Hamed Mahdavi

S-D-RSM: Stochastic Distributed Regularized Splitting Method for Large-Scale Convex Optimization Problems

This paper investigates the problems large-scale distributed composite convex optimization, with motivations from a broad range of applications, including multi-agent systems, federated learning, smart grids, wireless sensor networks,…

Optimization and Control · Mathematics 2025-12-16 Maoran Wang , Xingju Cai , Yongxin Chen

PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization

We study gradient compression methods to alleviate the communication bottleneck in data-parallel distributed optimization. Despite the significant attention received, current compression schemes either do not scale well or fail to achieve…

Machine Learning · Computer Science 2020-02-19 Thijs Vogels , Sai Praneeth Karimireddy , Martin Jaggi