Related papers: On Biased Compression for Distributed Learning

On Communication Compression for Distributed Optimization on Heterogeneous Data

Lossy gradient compression, with either unbiased or biased compressors, has become a key tool to avoid the communication bottleneck in centrally coordinated distributed training of machine learning models. We analyze the performance of two…

Machine Learning · Computer Science 2020-12-23 Sebastian U. Stich

Distributed Methods with Absolute Compression and Error Compensation

Distributed optimization methods are often applied to solving huge-scale problems like training neural networks with millions and even billions of parameters. In such applications, communicating full vectors, e.g., (stochastic) gradients,…

Optimization and Control · Mathematics 2022-05-31 Marina Danilova , Eduard Gorbunov

Compressed Communication for Distributed Training: Adaptive Methods and System

Communication overhead severely hinders the scalability of distributed machine learning systems. Recently, there has been a growing interest in using gradient compression to reduce the communication overhead of the distributed training.…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-19 Yuchen Zhong , Cong Xie , Shuai Zheng , Haibin Lin

DoubleSqueeze: Parallel Stochastic Gradient Descent with Double-Pass Error-Compensated Compression

A standard approach in large scale machine learning is distributed stochastic gradient training, which requires the computation of aggregated stochastic gradients over multiple nodes on a network. Communication is a major bottleneck in such…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-03-24 Hanlin Tang , Xiangru Lian , Chen Yu , Tong Zhang , Ji Liu

Communication-Compressed Adaptive Gradient Method for Distributed Nonconvex Optimization

Due to the explosion in the size of the training datasets, distributed learning has received growing interest in recent years. One of the major bottlenecks is the large communication cost between the central server and the local workers.…

Machine Learning · Computer Science 2022-02-25 Yujia Wang , Lu Lin , Jinghui Chen

Acceleration for Compressed Gradient Descent in Distributed and Federated Optimization

Due to the high communication cost in distributed and federated learning problems, methods relying on compression of communicated messages are becoming increasingly popular. While in other contexts the best performing gradient-type methods…

Optimization and Control · Mathematics 2020-06-29 Zhize Li , Dmitry Kovalev , Xun Qian , Peter Richtárik

Biased Compression in Gradient Coding for Distributed Learning

Communication bottlenecks and the presence of stragglers pose significant challenges in distributed learning (DL). To deal with these challenges, recent advances leverage unbiased compression functions and gradient coding. However, the…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-18 Chengxi Li , Ming Xiao , Mikael Skoglund

On the Convergence of SGD with Biased Gradients

We analyze the complexity of biased stochastic gradient methods (SGD), where individual updates are corrupted by deterministic, i.e. biased error terms. We derive convergence results for smooth (non-convex) functions and give improved rates…

Machine Learning · Computer Science 2021-05-11 Ahmad Ajalloeian , Sebastian U. Stich

Distributed learning with compressed gradients

Asynchronous computation and gradient compression have emerged as two key techniques for achieving scalability in distributed optimization for large-scale machine learning. This paper presents a unified analysis framework for distributed…

Optimization and Control · Mathematics 2018-11-30 Sarit Khirirat , Hamid Reza Feyzmahdavian , Mikael Johansson

Communication-Efficient Distributed SGD with Compressed Sensing

We consider large scale distributed optimization over a set of edge devices connected to a central server, where the limited communication bandwidth between the server and edge devices imposes a significant bottleneck for the optimization…

Optimization and Control · Mathematics 2021-12-28 Yujie Tang , Vikram Ramanathan , Junshan Zhang , Na Li

Error Compensated Distributed SGD Can Be Accelerated

Gradient compression is a recent and increasingly popular technique for reducing the communication cost in distributed training of large-scale machine learning models. In this work we focus on developing efficient distributed methods that…

Optimization and Control · Mathematics 2020-10-02 Xun Qian , Peter Richtárik , Tong Zhang

Unbiased and Sign Compression in Distributed Learning: Comparing Noise Resilience via SDEs

Distributed methods are essential for handling machine learning pipelines comprising large-scale models and datasets. However, their benefits often come at the cost of increased communication overhead between the central server and agents,…

Machine Learning · Computer Science 2025-03-03 Enea Monzio Compagnoni , Rustem Islamov , Frank Norbert Proske , Aurelien Lucchi

Communication Compression for Distributed Learning with Aggregate and Server-Guided Feedback

Distributed learning, particularly Federated Learning (FL), faces a significant bottleneck in the communication cost, particularly the uplink transmission of client-to-server updates, which is often constrained by asymmetric bandwidth…

Machine Learning · Computer Science 2026-02-19 Tomas Ortega , Chun-Yin Huang , Xiaoxiao Li , Hamid Jafarkhani

Escaping Saddle Points with Compressed SGD

Stochastic gradient descent (SGD) is a prevalent optimization technique for large-scale distributed machine learning. While SGD computation can be efficiently divided between multiple machines, communication typically becomes a bottleneck…

Machine Learning · Computer Science 2021-05-24 Dmitrii Avdiukhin , Grigory Yaroslavtsev

Shifted Compression Framework: Generalizations and Improvements

Communication is one of the key bottlenecks in the distributed training of large-scale machine learning models, and lossy compression of exchanged information, such as stochastic gradients or models, is one of the most effective instruments…

Machine Learning · Computer Science 2022-06-22 Egor Shulgin , Peter Richtárik

On the Utility of Gradient Compression in Distributed Training Systems

A rich body of prior work has highlighted the existence of communication bottlenecks in synchronous data-parallel training. To alleviate these bottlenecks, a long line of recent work proposes gradient and model compression methods. In this…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-07-01 Saurabh Agarwal , Hongyi Wang , Shivaram Venkataraman , Dimitris Papailiopoulos

Flattened one-bit stochastic gradient descent: compressed distributed optimization with controlled variance

We propose a novel algorithm for distributed stochastic gradient descent (SGD) with compressed gradient communication in the parameter-server framework. Our gradient compression technique, named flattened one-bit stochastic gradient descent…

Machine Learning · Computer Science 2024-05-21 Alexander Stollenwerk , Laurent Jacques

Variance-based Gradient Compression for Efficient Distributed Deep Learning

Due to the substantial computational cost, training state-of-the-art deep neural networks for large-scale datasets often requires distributed training using multiple computation workers. However, by nature, workers need to frequently…

Machine Learning · Computer Science 2018-02-21 Yusuke Tsuzuku , Hiroto Imachi , Takuya Akiba

Adaptive Compression for Communication-Efficient Distributed Training

We propose Adaptive Compressed Gradient Descent (AdaCGD) - a novel optimization algorithm for communication-efficient training of supervised machine learning models with adaptive compression level. Our approach is inspired by the recently…

Machine Learning · Computer Science 2022-11-02 Maksim Makarenko , Elnur Gasanov , Rustem Islamov , Abdurakhmon Sadiev , Peter Richtarik

Communication-Efficient Distributed Learning with Local Immediate Error Compensation

Gradient compression with error compensation has attracted significant attention with the target of reducing the heavy communication overhead in distributed learning. However, existing compression methods either perform only unidirectional…

Machine Learning · Computer Science 2024-02-20 Yifei Cheng , Li Shen , Linli Xu , Xun Qian , Shiwei Wu , Yiming Zhou , Tie Zhang , Dacheng Tao , Enhong Chen