Related papers: Error Compensated Loopless SVRG, Quartz, and SDCA …

Distributed Methods with Absolute Compression and Error Compensation

Distributed optimization methods are often applied to solving huge-scale problems like training neural networks with millions and even billions of parameters. In such applications, communicating full vectors, e.g., (stochastic) gradients,…

Optimization and Control · Mathematics 2022-05-31 Marina Danilova , Eduard Gorbunov

Error Compensated Distributed SGD Can Be Accelerated

Gradient compression is a recent and increasingly popular technique for reducing the communication cost in distributed training of large-scale machine learning models. In this work we focus on developing efficient distributed methods that…

Optimization and Control · Mathematics 2020-10-02 Xun Qian , Peter Richtárik , Tong Zhang

Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization

Large-scale distributed optimization is of great importance in various applications. For data-parallel based distributed learning, the inter-node gradient communication often becomes the performance bottleneck. In this paper, we propose the…

Computer Vision and Pattern Recognition · Computer Science 2018-06-22 Jiaxiang Wu , Weidong Huang , Junzhou Huang , Tong Zhang

Linearly Converging Error Compensated SGD

In this paper, we propose a unified analysis of variants of distributed SGD with arbitrary compressions and delayed updates. Our framework is general enough to cover different variants of quantized SGD, Error-Compensated SGD (EC-SGD) and…

Optimization and Control · Mathematics 2020-10-26 Eduard Gorbunov , Dmitry Kovalev , Dmitry Makarenko , Peter Richtárik

ErrorCompensatedX: error compensation for variance reduced algorithms

Communication cost is one major bottleneck for the scalability for distributed learning. One approach to reduce the communication cost is to compress the gradient during communication. However, directly compressing the gradient decelerates…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-08-05 Hanlin Tang , Yao Li , Ji Liu , Ming Yan

Catalyst Acceleration of Error Compensated Methods Leads to Better Communication Complexity

Communication overhead is well known to be a key bottleneck in large scale distributed learning, and a particularly successful class of methods which help to overcome this bottleneck is based on the idea of communication compression. Some…

Optimization and Control · Mathematics 2023-01-25 Xun Qian , Hanze Dong , Tong Zhang , Peter Richtárik

Without-Replacement Sampling for Stochastic Gradient Methods: Convergence Results and Application to Distributed Optimization

Stochastic gradient methods for machine learning and optimization problems are usually analyzed assuming data points are sampled \emph{with} replacement. In practice, however, sampling \emph{without} replacement is very common, easier to…

Machine Learning · Computer Science 2016-10-18 Ohad Shamir

On Communication Compression for Distributed Optimization on Heterogeneous Data

Lossy gradient compression, with either unbiased or biased compressors, has become a key tool to avoid the communication bottleneck in centrally coordinated distributed training of machine learning models. We analyze the performance of two…

Machine Learning · Computer Science 2020-12-23 Sebastian U. Stich

EControl: Fast Distributed Optimization with Compression and Error Control

Modern distributed training relies heavily on communication compression to reduce the communication overhead. In this work, we study algorithms employing a popular class of contractive compressors in order to reduce communication overhead.…

Optimization and Control · Mathematics 2023-11-13 Yuan Gao , Rustem Islamov , Sebastian Stich

Communication-Efficient Distributed Learning with Local Immediate Error Compensation

Gradient compression with error compensation has attracted significant attention with the target of reducing the heavy communication overhead in distributed learning. However, existing compression methods either perform only unidirectional…

Machine Learning · Computer Science 2024-02-20 Yifei Cheng , Li Shen , Linli Xu , Xun Qian , Shiwei Wu , Yiming Zhou , Tie Zhang , Dacheng Tao , Enhong Chen

3PC: Three Point Compressors for Communication-Efficient Distributed Training and a Better Theory for Lazy Aggregation

We propose and study a new class of gradient communication mechanisms for communication-efficient training -- three point compressors (3PC) -- as well as efficient distributed nonconvex optimization algorithms that can take advantage of…

Machine Learning · Computer Science 2022-02-03 Peter Richtárik , Igor Sokolov , Ilyas Fatkhullin , Elnur Gasanov , Zhize Li , Eduard Gorbunov

On Biased Compression for Distributed Learning

In the last few years, various communication compression techniques have emerged as an indispensable tool helping to alleviate the communication bottleneck in distributed learning. However, despite the fact biased compressors often show…

Machine Learning · Computer Science 2024-01-17 Aleksandr Beznosikov , Samuel Horváth , Peter Richtárik , Mher Safaryan

A Better Alternative to Error Feedback for Communication-Efficient Distributed Learning

Modern large-scale machine learning applications require stochastic optimization algorithms to be implemented on distributed compute systems. A key bottleneck of such systems is the communication overhead for exchanging information across…

Machine Learning · Computer Science 2021-03-16 Samuel Horváth , Peter Richtárik

Communication-Compressed Adaptive Gradient Method for Distributed Nonconvex Optimization

Due to the explosion in the size of the training datasets, distributed learning has received growing interest in recent years. One of the major bottlenecks is the large communication cost between the central server and the local workers.…

Machine Learning · Computer Science 2022-02-25 Yujia Wang , Lu Lin , Jinghui Chen

Trustworthy Efficient Communication for Distributed Learning using LQ-SGD Algorithm

We propose LQ-SGD (Low-Rank Quantized Stochastic Gradient Descent), an efficient communication gradient compression algorithm designed for distributed training. LQ-SGD further develops on the basis of PowerSGD by incorporating the low-rank…

Machine Learning · Computer Science 2025-06-24 Hongyang Li , Lincen Bai , Caesar Wu , Mohammed Chadli , Said Mammar , Pascal Bouvry

Contractive error feedback for gradient compression

On-device memory concerns in distributed deep learning have become severe due to (i) the growth of model size in multi-GPU training, and (ii) the wide adoption of deep neural networks for federated learning on IoT devices which have limited…

Machine Learning · Computer Science 2023-12-15 Bingcong Li , Shuai Zheng , Parameswaran Raman , Anshumali Shrivastava , Georgios B. Giannakis

Exploring Fast and Communication-Efficient Algorithms in Large-scale Distributed Networks

The communication overhead has become a significant bottleneck in data-parallel network with the increasing of model size and data samples. In this work, we propose a new algorithm LPC-SVRG with quantized gradients and its acceleration…

Optimization and Control · Mathematics 2019-03-01 Yue Yu , Jiaxiang Wu , Junzhou Huang

The Error-Feedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Communication

We analyze (stochastic) gradient descent (SGD) with delayed updates on smooth quasi-convex and non-convex functions and derive concise, non-asymptotic, convergence rates. We show that the rate of convergence in all cases consists of two…

Machine Learning · Computer Science 2021-06-17 Sebastian U. Stich , Sai Praneeth Karimireddy

CSER: Communication-efficient SGD with Error Reset

The scalability of Distributed Stochastic Gradient Descent (SGD) is today limited by communication bottlenecks. We propose a novel SGD variant: Communication-efficient SGD with Error Reset, or CSER. The key idea in CSER is first a new…

Machine Learning · Computer Science 2020-12-08 Cong Xie , Shuai Zheng , Oluwasanmi Koyejo , Indranil Gupta , Mu Li , Haibin Lin

Adaptive Top-K in SGD for Communication-Efficient Distributed Learning

Distributed stochastic gradient descent (SGD) with gradient compression has become a popular communication-efficient solution for accelerating distributed learning. One commonly used method for gradient compression is Top-K sparsification,…

Machine Learning · Computer Science 2023-09-12 Mengzhe Ruan , Guangfeng Yan , Yuanzhang Xiao , Linqi Song , Weitao Xu