Related papers: EControl: Fast Distributed Optimization with Compr…

A Better Alternative to Error Feedback for Communication-Efficient Distributed Learning

Modern large-scale machine learning applications require stochastic optimization algorithms to be implemented on distributed compute systems. A key bottleneck of such systems is the communication overhead for exchanging information across…

Machine Learning · Computer Science 2021-03-16 Samuel Horváth , Peter Richtárik

EF21: A New, Simpler, Theoretically Better, and Practically Faster Error Feedback

Error feedback (EF), also known as error compensation, is an immensely popular convergence stabilization mechanism in the context of distributed training of supervised machine learning models enhanced by the use of contractive communication…

Machine Learning · Computer Science 2021-06-10 Peter Richtárik , Igor Sokolov , Ilyas Fatkhullin

Composite Optimization with Error Feedback: the Dual Averaging Approach

Communication efficiency is a central challenge in distributed machine learning training, and message compression is a widely used solution. However, standard Error Feedback (EF) methods (Seide et al., 2014), though effective for smooth…

Optimization and Control · Mathematics 2025-10-07 Yuan Gao , Anton Rodomanov , Jeremy Rack , Sebastian Stich

Safe-EF: Error Feedback for Nonsmooth Constrained Optimization

Federated learning faces severe communication bottlenecks due to the high dimensionality of model updates. Communication compression with contractive compressors (e.g., Top-K) is often preferable in practice but can degrade performance…

Machine Learning · Computer Science 2025-06-04 Rustem Islamov , Yarden As , Ilyas Fatkhullin

Distributed Methods with Absolute Compression and Error Compensation

Distributed optimization methods are often applied to solving huge-scale problems like training neural networks with millions and even billions of parameters. In such applications, communicating full vectors, e.g., (stochastic) gradients,…

Optimization and Control · Mathematics 2022-05-31 Marina Danilova , Eduard Gorbunov

EF21 with Bells & Whistles: Six Algorithmic Extensions of Modern Error Feedback

First proposed by Seide (2014) as a heuristic, error feedback (EF) is a very popular mechanism for enforcing convergence of distributed gradient-based optimization methods enhanced with communication compression strategies based on the…

Machine Learning · Computer Science 2025-06-23 Ilyas Fatkhullin , Igor Sokolov , Eduard Gorbunov , Zhize Li , Peter Richtárik

Contractive error feedback for gradient compression

On-device memory concerns in distributed deep learning have become severe due to (i) the growth of model size in multi-GPU training, and (ii) the wide adoption of deep neural networks for federated learning on IoT devices which have limited…

Machine Learning · Computer Science 2023-12-15 Bingcong Li , Shuai Zheng , Parameswaran Raman , Anshumali Shrivastava , Georgios B. Giannakis

Accelerated Distributed Optimization with Compression and Error Feedback

Modern machine learning tasks often involve massive datasets and models, necessitating distributed optimization algorithms with reduced communication overhead. Communication compression, where clients transmit compressed updates to a…

Optimization and Control · Mathematics 2025-04-01 Yuan Gao , Anton Rodomanov , Jeremy Rack , Sebastian U. Stich

Error Feedback Reloaded: From Quadratic to Arithmetic Mean of Smoothness Constants

Error Feedback (EF) is a highly popular and immensely effective mechanism for fixing convergence issues which arise in distributed training methods (such as distributed GD or SGD) when these are enhanced with greedy communication…

Machine Learning · Computer Science 2024-02-19 Peter Richtárik , Elnur Gasanov , Konstantin Burlachenko

Catalyst Acceleration of Error Compensated Methods Leads to Better Communication Complexity

Communication overhead is well known to be a key bottleneck in large scale distributed learning, and a particularly successful class of methods which help to overcome this bottleneck is based on the idea of communication compression. Some…

Optimization and Control · Mathematics 2023-01-25 Xun Qian , Hanze Dong , Tong Zhang , Peter Richtárik

3PC: Three Point Compressors for Communication-Efficient Distributed Training and a Better Theory for Lazy Aggregation

We propose and study a new class of gradient communication mechanisms for communication-efficient training -- three point compressors (3PC) -- as well as efficient distributed nonconvex optimization algorithms that can take advantage of…

Machine Learning · Computer Science 2022-02-03 Peter Richtárik , Igor Sokolov , Ilyas Fatkhullin , Elnur Gasanov , Zhize Li , Eduard Gorbunov

Momentum Provably Improves Error Feedback!

Due to the high communication overhead when training machine learning models in a distributed environment, modern algorithms invariably rely on lossy communication compression. However, when untreated, the errors caused by compression…

Machine Learning · Computer Science 2023-10-31 Ilyas Fatkhullin , Alexander Tyurin , Peter Richtárik

ErrorCompensatedX: error compensation for variance reduced algorithms

Communication cost is one major bottleneck for the scalability for distributed learning. One approach to reduce the communication cost is to compress the gradient during communication. However, directly compressing the gradient decelerates…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-08-05 Hanlin Tang , Yao Li , Ji Liu , Ming Yan

On Communication Compression for Distributed Optimization on Heterogeneous Data

Lossy gradient compression, with either unbiased or biased compressors, has become a key tool to avoid the communication bottleneck in centrally coordinated distributed training of machine learning models. We analyze the performance of two…

Machine Learning · Computer Science 2020-12-23 Sebastian U. Stich

Improved Convergence in Parameter-Agnostic Error Feedback through Momentum

Communication compression is essential for scalable distributed training of modern machine learning models, but it often degrades convergence due to the noise it introduces. Error Feedback (EF) mechanisms are widely adopted to mitigate this…

Optimization and Control · Mathematics 2025-11-19 Abdurakhmon Sadiev , Yury Demidovich , Igor Sokolov , Grigory Malinovsky , Sarit Khirirat , Peter Richtárik

Accordion: Adaptive Gradient Communication via Critical Learning Regime Identification

Distributed model training suffers from communication bottlenecks due to frequent model updates transmitted across compute nodes. To alleviate these bottlenecks, practitioners use gradient compression techniques like sparsification,…

Machine Learning · Computer Science 2020-11-02 Saurabh Agarwal , Hongyi Wang , Kangwook Lee , Shivaram Venkataraman , Dimitris Papailiopoulos

2Direction: Theoretically Faster Distributed Training with Bidirectional Communication Compression

We consider distributed convex optimization problems in the regime when the communication between the server and the workers is expensive in both uplink and downlink directions. We develop a new and provably accelerated method, which we…

Optimization and Control · Mathematics 2023-11-28 Alexander Tyurin , Peter Richtárik

Error Compensated Loopless SVRG, Quartz, and SDCA for Distributed Optimization

The communication of gradients is a key bottleneck in distributed training of large scale machine learning models. In order to reduce the communication cost, gradient compression (e.g., sparsification and quantization) and error…

Optimization and Control · Mathematics 2021-09-22 Xun Qian , Hanze Dong , Peter Richtárik , Tong Zhang

Accelerated Methods with Compressed Communications for Distributed Optimization Problems under Data Similarity

In recent years, as data and problem sizes have increased, distributed learning has become an essential tool for training high-performance models. However, the communication bottleneck, especially for high-dimensional data, is a challenge.…

Optimization and Control · Mathematics 2025-04-28 Dmitry Bylinkin , Aleksandr Beznosikov

Error Compensated Distributed SGD Can Be Accelerated

Gradient compression is a recent and increasingly popular technique for reducing the communication cost in distributed training of large-scale machine learning models. In this work we focus on developing efficient distributed methods that…

Optimization and Control · Mathematics 2020-10-02 Xun Qian , Peter Richtárik , Tong Zhang