Related papers: Distributed Learning with Compressed Gradient Diff…

Quantizing data for distributed learning

We consider machine learning applications that train a model by leveraging data distributed over a trusted network, where communication constraints can create a performance bottleneck. A number of recent approaches propose to overcome this…

Machine Learning · Computer Science 2021-09-10 Osama A. Hanna , Yahya H. Ezzeldin , Christina Fragouli , Suhas Diggavi

MARINA: Faster Non-Convex Distributed Learning with Compression

We develop and analyze MARINA: a new communication efficient method for non-convex distributed learning over heterogeneous datasets. MARINA employs a novel communication compression strategy based on the compression of gradient differences…

Machine Learning · Computer Science 2022-01-11 Eduard Gorbunov , Konstantin Burlachenko , Zhize Li , Peter Richtárik

Improved Quantization Strategies for Managing Heavy-tailed Gradients in Distributed Learning

Gradient compression has surfaced as a key technique to address the challenge of communication efficiency in distributed learning. In distributed deep learning, however, it is observed that gradient distributions are heavy-tailed, with…

Machine Learning · Computer Science 2024-02-07 Guangfeng Yan , Tan Li , Yuanzhang Xiao , Hanxu Hou , Linqi Song

Variance-based Gradient Compression for Efficient Distributed Deep Learning

Due to the substantial computational cost, training state-of-the-art deep neural networks for large-scale datasets often requires distributed training using multiple computation workers. However, by nature, workers need to frequently…

Machine Learning · Computer Science 2018-02-21 Yusuke Tsuzuku , Hiroto Imachi , Takuya Akiba

Federated Optimization Algorithms with Random Reshuffling and Gradient Compression

Gradient compression is a popular technique for improving communication complexity of stochastic first-order methods in distributed training of machine learning models. However, the existing works consider only with-replacement sampling of…

Machine Learning · Computer Science 2022-11-04 Abdurakhmon Sadiev , Grigory Malinovsky , Eduard Gorbunov , Igor Sokolov , Ahmed Khaled , Konstantin Burlachenko , Peter Richtárik

Unbiased Single-scale and Multi-scale Quantizers for Distributed Optimization

Massive amounts of data have led to the training of large-scale machine learning models on a single worker inefficient. Distributed machine learning methods such as Parallel-SGD have received significant interest as a solution to tackle…

Machine Learning · Computer Science 2022-03-31 S Vineeth

The Convergence of Sparsified Gradient Methods

Distributed training of massive machine learning models, in particular deep neural networks, via Stochastic Gradient Descent (SGD) is becoming commonplace. Several families of communication-reduction methods, such as quantization,…

Machine Learning · Computer Science 2018-09-28 Dan Alistarh , Torsten Hoefler , Mikael Johansson , Sarit Khirirat , Nikola Konstantinov , Cédric Renggli

Communication-Compressed Adaptive Gradient Method for Distributed Nonconvex Optimization

Due to the explosion in the size of the training datasets, distributed learning has received growing interest in recent years. One of the major bottlenecks is the large communication cost between the central server and the local workers.…

Machine Learning · Computer Science 2022-02-25 Yujia Wang , Lu Lin , Jinghui Chen

Linearly Converging Error Compensated SGD

In this paper, we propose a unified analysis of variants of distributed SGD with arbitrary compressions and delayed updates. Our framework is general enough to cover different variants of quantized SGD, Error-Compensated SGD (EC-SGD) and…

Optimization and Control · Mathematics 2020-10-26 Eduard Gorbunov , Dmitry Kovalev , Dmitry Makarenko , Peter Richtárik

On the Discrepancy between the Theoretical Analysis and Practical Implementations of Compressed Communication for Distributed Deep Learning

Compressed communication, in the form of sparsification or quantization of stochastic gradients, is employed to reduce communication costs in distributed data-parallel training of deep neural networks. However, there exists a discrepancy…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-11-20 Aritra Dutta , El Houcine Bergou , Ahmed M. Abdelmoniem , Chen-Yu Ho , Atal Narayan Sahu , Marco Canini , Panos Kalnis

Distributed learning with compressed gradients

Asynchronous computation and gradient compression have emerged as two key techniques for achieving scalability in distributed optimization for large-scale machine learning. This paper presents a unified analysis framework for distributed…

Optimization and Control · Mathematics 2018-11-30 Sarit Khirirat , Hamid Reza Feyzmahdavian , Mikael Johansson

Compressed Communication for Distributed Training: Adaptive Methods and System

Communication overhead severely hinders the scalability of distributed machine learning systems. Recently, there has been a growing interest in using gradient compression to reduce the communication overhead of the distributed training.…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-19 Yuchen Zhong , Cong Xie , Shuai Zheng , Haibin Lin

Quantize Once, Train Fast: Allreduce-Compatible Compression with Provable Guarantees

Distributed training enables large-scale deep learning, but suffers from high communication overhead, especially as models and datasets grow. Gradient compression, particularly quantization, is a promising approach to mitigate this…

Machine Learning · Computer Science 2025-07-30 Jihao Xin , Marco Canini , Peter Richtárik , Samuel Horváth

Acceleration for Compressed Gradient Descent in Distributed and Federated Optimization

Due to the high communication cost in distributed and federated learning problems, methods relying on compression of communicated messages are becoming increasingly popular. While in other contexts the best performing gradient-type methods…

Optimization and Control · Mathematics 2020-06-29 Zhize Li , Dmitry Kovalev , Xun Qian , Peter Richtárik

A Distributed Training Algorithm of Generative Adversarial Networks with Quantized Gradients

Training generative adversarial networks (GAN) in a distributed fashion is a promising technology since it is contributed to training GAN on a massive of data efficiently in real-world applications. However, GAN is known to be difficult to…

Machine Learning · Computer Science 2020-10-27 Xiaojun Chen , Shu Yang , Li Shen , Xuanrong Pang

Unbiased and Sign Compression in Distributed Learning: Comparing Noise Resilience via SDEs

Distributed methods are essential for handling machine learning pipelines comprising large-scale models and datasets. However, their benefits often come at the cost of increased communication overhead between the central server and agents,…

Machine Learning · Computer Science 2025-03-03 Enea Monzio Compagnoni , Rustem Islamov , Frank Norbert Proske , Aurelien Lucchi

TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning

High network communication cost for synchronizing gradients and parameters is the well-known bottleneck of distributed training. In this work, we propose TernGrad that uses ternary gradients to accelerate distributed deep learning in data…

Machine Learning · Computer Science 2018-01-01 Wei Wen , Cong Xu , Feng Yan , Chunpeng Wu , Yandan Wang , Yiran Chen , Hai Li

Efficient Distributed Training through Gradient Compression with Sparsification and Quantization Techniques

This study investigates the impact of gradient compression on distributed training performance, focusing on sparsification and quantization techniques, including top-k, DGC, and QSGD. In baseline experiments, random-k compression results in…

Machine Learning · Computer Science 2025-02-12 Shruti Singh , Shantanu Kumar

Taming Momentum in a Distributed Asynchronous Environment

Although distributed computing can significantly reduce the training time of deep neural networks, scaling the training process while maintaining high efficiency and final accuracy is challenging. Distributed asynchronous training enjoys…

Machine Learning · Computer Science 2020-10-15 Ido Hakimi , Saar Barkai , Moshe Gabel , Assaf Schuster

Stochastic Distributed Learning with Gradient Quantization and Variance Reduction

We consider distributed optimization where the objective function is spread among different devices, each sending incremental model updates to a central server. To alleviate the communication bottleneck, recent work proposed various schemes…

Optimization and Control · Mathematics 2019-04-11 Samuel Horváth , Dmitry Kovalev , Konstantin Mishchenko , Sebastian Stich , Peter Richtárik