English
Related papers

Related papers: Error Compensated Loopless SVRG, Quartz, and SDCA …

200 papers

Distributed optimization methods are often applied to solving huge-scale problems like training neural networks with millions and even billions of parameters. In such applications, communicating full vectors, e.g., (stochastic) gradients,…

Optimization and Control · Mathematics 2022-05-31 Marina Danilova , Eduard Gorbunov

Gradient compression is a recent and increasingly popular technique for reducing the communication cost in distributed training of large-scale machine learning models. In this work we focus on developing efficient distributed methods that…

Optimization and Control · Mathematics 2020-10-02 Xun Qian , Peter Richtárik , Tong Zhang

Large-scale distributed optimization is of great importance in various applications. For data-parallel based distributed learning, the inter-node gradient communication often becomes the performance bottleneck. In this paper, we propose the…

Computer Vision and Pattern Recognition · Computer Science 2018-06-22 Jiaxiang Wu , Weidong Huang , Junzhou Huang , Tong Zhang

In this paper, we propose a unified analysis of variants of distributed SGD with arbitrary compressions and delayed updates. Our framework is general enough to cover different variants of quantized SGD, Error-Compensated SGD (EC-SGD) and…

Optimization and Control · Mathematics 2020-10-26 Eduard Gorbunov , Dmitry Kovalev , Dmitry Makarenko , Peter Richtárik

Communication cost is one major bottleneck for the scalability for distributed learning. One approach to reduce the communication cost is to compress the gradient during communication. However, directly compressing the gradient decelerates…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-08-05 Hanlin Tang , Yao Li , Ji Liu , Ming Yan

Communication overhead is well known to be a key bottleneck in large scale distributed learning, and a particularly successful class of methods which help to overcome this bottleneck is based on the idea of communication compression. Some…

Optimization and Control · Mathematics 2023-01-25 Xun Qian , Hanze Dong , Tong Zhang , Peter Richtárik

Stochastic gradient methods for machine learning and optimization problems are usually analyzed assuming data points are sampled \emph{with} replacement. In practice, however, sampling \emph{without} replacement is very common, easier to…

Machine Learning · Computer Science 2016-10-18 Ohad Shamir

Lossy gradient compression, with either unbiased or biased compressors, has become a key tool to avoid the communication bottleneck in centrally coordinated distributed training of machine learning models. We analyze the performance of two…

Machine Learning · Computer Science 2020-12-23 Sebastian U. Stich

Modern distributed training relies heavily on communication compression to reduce the communication overhead. In this work, we study algorithms employing a popular class of contractive compressors in order to reduce communication overhead.…

Optimization and Control · Mathematics 2023-11-13 Yuan Gao , Rustem Islamov , Sebastian Stich

Gradient compression with error compensation has attracted significant attention with the target of reducing the heavy communication overhead in distributed learning. However, existing compression methods either perform only unidirectional…

Machine Learning · Computer Science 2024-02-20 Yifei Cheng , Li Shen , Linli Xu , Xun Qian , Shiwei Wu , Yiming Zhou , Tie Zhang , Dacheng Tao , Enhong Chen

We propose and study a new class of gradient communication mechanisms for communication-efficient training -- three point compressors (3PC) -- as well as efficient distributed nonconvex optimization algorithms that can take advantage of…

Machine Learning · Computer Science 2022-02-03 Peter Richtárik , Igor Sokolov , Ilyas Fatkhullin , Elnur Gasanov , Zhize Li , Eduard Gorbunov

In the last few years, various communication compression techniques have emerged as an indispensable tool helping to alleviate the communication bottleneck in distributed learning. However, despite the fact biased compressors often show…

Machine Learning · Computer Science 2024-01-17 Aleksandr Beznosikov , Samuel Horváth , Peter Richtárik , Mher Safaryan

Modern large-scale machine learning applications require stochastic optimization algorithms to be implemented on distributed compute systems. A key bottleneck of such systems is the communication overhead for exchanging information across…

Machine Learning · Computer Science 2021-03-16 Samuel Horváth , Peter Richtárik

Due to the explosion in the size of the training datasets, distributed learning has received growing interest in recent years. One of the major bottlenecks is the large communication cost between the central server and the local workers.…

Machine Learning · Computer Science 2022-02-25 Yujia Wang , Lu Lin , Jinghui Chen

We propose LQ-SGD (Low-Rank Quantized Stochastic Gradient Descent), an efficient communication gradient compression algorithm designed for distributed training. LQ-SGD further develops on the basis of PowerSGD by incorporating the low-rank…

Machine Learning · Computer Science 2025-06-24 Hongyang Li , Lincen Bai , Caesar Wu , Mohammed Chadli , Said Mammar , Pascal Bouvry

On-device memory concerns in distributed deep learning have become severe due to (i) the growth of model size in multi-GPU training, and (ii) the wide adoption of deep neural networks for federated learning on IoT devices which have limited…

Machine Learning · Computer Science 2023-12-15 Bingcong Li , Shuai Zheng , Parameswaran Raman , Anshumali Shrivastava , Georgios B. Giannakis

The communication overhead has become a significant bottleneck in data-parallel network with the increasing of model size and data samples. In this work, we propose a new algorithm LPC-SVRG with quantized gradients and its acceleration…

Optimization and Control · Mathematics 2019-03-01 Yue Yu , Jiaxiang Wu , Junzhou Huang

We analyze (stochastic) gradient descent (SGD) with delayed updates on smooth quasi-convex and non-convex functions and derive concise, non-asymptotic, convergence rates. We show that the rate of convergence in all cases consists of two…

Machine Learning · Computer Science 2021-06-17 Sebastian U. Stich , Sai Praneeth Karimireddy

The scalability of Distributed Stochastic Gradient Descent (SGD) is today limited by communication bottlenecks. We propose a novel SGD variant: Communication-efficient SGD with Error Reset, or CSER. The key idea in CSER is first a new…

Machine Learning · Computer Science 2020-12-08 Cong Xie , Shuai Zheng , Oluwasanmi Koyejo , Indranil Gupta , Mu Li , Haibin Lin

Distributed stochastic gradient descent (SGD) with gradient compression has become a popular communication-efficient solution for accelerating distributed learning. One commonly used method for gradient compression is Top-K sparsification,…

Machine Learning · Computer Science 2023-09-12 Mengzhe Ruan , Guangfeng Yan , Yuanzhang Xiao , Linqi Song , Weitao Xu
‹ Prev 1 2 3 10 Next ›