English
Related papers

Related papers: Distributed Learning with Compressed Gradient Diff…

200 papers

We consider machine learning applications that train a model by leveraging data distributed over a trusted network, where communication constraints can create a performance bottleneck. A number of recent approaches propose to overcome this…

Machine Learning · Computer Science 2021-09-10 Osama A. Hanna , Yahya H. Ezzeldin , Christina Fragouli , Suhas Diggavi

We develop and analyze MARINA: a new communication efficient method for non-convex distributed learning over heterogeneous datasets. MARINA employs a novel communication compression strategy based on the compression of gradient differences…

Machine Learning · Computer Science 2022-01-11 Eduard Gorbunov , Konstantin Burlachenko , Zhize Li , Peter Richtárik

Gradient compression has surfaced as a key technique to address the challenge of communication efficiency in distributed learning. In distributed deep learning, however, it is observed that gradient distributions are heavy-tailed, with…

Machine Learning · Computer Science 2024-02-07 Guangfeng Yan , Tan Li , Yuanzhang Xiao , Hanxu Hou , Linqi Song

Due to the substantial computational cost, training state-of-the-art deep neural networks for large-scale datasets often requires distributed training using multiple computation workers. However, by nature, workers need to frequently…

Machine Learning · Computer Science 2018-02-21 Yusuke Tsuzuku , Hiroto Imachi , Takuya Akiba

Gradient compression is a popular technique for improving communication complexity of stochastic first-order methods in distributed training of machine learning models. However, the existing works consider only with-replacement sampling of…

Massive amounts of data have led to the training of large-scale machine learning models on a single worker inefficient. Distributed machine learning methods such as Parallel-SGD have received significant interest as a solution to tackle…

Machine Learning · Computer Science 2022-03-31 S Vineeth

Distributed training of massive machine learning models, in particular deep neural networks, via Stochastic Gradient Descent (SGD) is becoming commonplace. Several families of communication-reduction methods, such as quantization,…

Machine Learning · Computer Science 2018-09-28 Dan Alistarh , Torsten Hoefler , Mikael Johansson , Sarit Khirirat , Nikola Konstantinov , Cédric Renggli

Due to the explosion in the size of the training datasets, distributed learning has received growing interest in recent years. One of the major bottlenecks is the large communication cost between the central server and the local workers.…

Machine Learning · Computer Science 2022-02-25 Yujia Wang , Lu Lin , Jinghui Chen

In this paper, we propose a unified analysis of variants of distributed SGD with arbitrary compressions and delayed updates. Our framework is general enough to cover different variants of quantized SGD, Error-Compensated SGD (EC-SGD) and…

Optimization and Control · Mathematics 2020-10-26 Eduard Gorbunov , Dmitry Kovalev , Dmitry Makarenko , Peter Richtárik

Compressed communication, in the form of sparsification or quantization of stochastic gradients, is employed to reduce communication costs in distributed data-parallel training of deep neural networks. However, there exists a discrepancy…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-11-20 Aritra Dutta , El Houcine Bergou , Ahmed M. Abdelmoniem , Chen-Yu Ho , Atal Narayan Sahu , Marco Canini , Panos Kalnis

Asynchronous computation and gradient compression have emerged as two key techniques for achieving scalability in distributed optimization for large-scale machine learning. This paper presents a unified analysis framework for distributed…

Optimization and Control · Mathematics 2018-11-30 Sarit Khirirat , Hamid Reza Feyzmahdavian , Mikael Johansson

Communication overhead severely hinders the scalability of distributed machine learning systems. Recently, there has been a growing interest in using gradient compression to reduce the communication overhead of the distributed training.…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-19 Yuchen Zhong , Cong Xie , Shuai Zheng , Haibin Lin

Distributed training enables large-scale deep learning, but suffers from high communication overhead, especially as models and datasets grow. Gradient compression, particularly quantization, is a promising approach to mitigate this…

Machine Learning · Computer Science 2025-07-30 Jihao Xin , Marco Canini , Peter Richtárik , Samuel Horváth

Due to the high communication cost in distributed and federated learning problems, methods relying on compression of communicated messages are becoming increasingly popular. While in other contexts the best performing gradient-type methods…

Optimization and Control · Mathematics 2020-06-29 Zhize Li , Dmitry Kovalev , Xun Qian , Peter Richtárik

Training generative adversarial networks (GAN) in a distributed fashion is a promising technology since it is contributed to training GAN on a massive of data efficiently in real-world applications. However, GAN is known to be difficult to…

Machine Learning · Computer Science 2020-10-27 Xiaojun Chen , Shu Yang , Li Shen , Xuanrong Pang

Distributed methods are essential for handling machine learning pipelines comprising large-scale models and datasets. However, their benefits often come at the cost of increased communication overhead between the central server and agents,…

Machine Learning · Computer Science 2025-03-03 Enea Monzio Compagnoni , Rustem Islamov , Frank Norbert Proske , Aurelien Lucchi

High network communication cost for synchronizing gradients and parameters is the well-known bottleneck of distributed training. In this work, we propose TernGrad that uses ternary gradients to accelerate distributed deep learning in data…

Machine Learning · Computer Science 2018-01-01 Wei Wen , Cong Xu , Feng Yan , Chunpeng Wu , Yandan Wang , Yiran Chen , Hai Li

This study investigates the impact of gradient compression on distributed training performance, focusing on sparsification and quantization techniques, including top-k, DGC, and QSGD. In baseline experiments, random-k compression results in…

Machine Learning · Computer Science 2025-02-12 Shruti Singh , Shantanu Kumar

Although distributed computing can significantly reduce the training time of deep neural networks, scaling the training process while maintaining high efficiency and final accuracy is challenging. Distributed asynchronous training enjoys…

Machine Learning · Computer Science 2020-10-15 Ido Hakimi , Saar Barkai , Moshe Gabel , Assaf Schuster

We consider distributed optimization where the objective function is spread among different devices, each sending incremental model updates to a central server. To alleviate the communication bottleneck, recent work proposed various schemes…

Optimization and Control · Mathematics 2019-04-11 Samuel Horváth , Dmitry Kovalev , Konstantin Mishchenko , Sebastian Stich , Peter Richtárik
‹ Prev 1 2 3 10 Next ›