Related papers: Distributed learning with compressed gradients

Variance-based Gradient Compression for Efficient Distributed Deep Learning

Due to the substantial computational cost, training state-of-the-art deep neural networks for large-scale datasets often requires distributed training using multiple computation workers. However, by nature, workers need to frequently…

Machine Learning · Computer Science 2018-02-21 Yusuke Tsuzuku , Hiroto Imachi , Takuya Akiba

On the convergence rate of distributed gradient methods for finite-sum optimization under communication delays

Motivated by applications in machine learning and statistics, we study distributed optimization problems over a network of processors, where the goal is to optimize a global objective composed of a sum of local functions. In these problems,…

Optimization and Control · Mathematics 2019-05-14 Thinh T. Doan , Carolyn L. Beck , R. Srikant

CD-SGD: Distributed Stochastic Gradient Descent with Compression and Delay Compensation

Communication overhead is the key challenge for distributed training. Gradient compression is a widely used approach to reduce communication traffic. When combining with parallel communication mechanism method like pipeline, gradient…

Machine Learning · Computer Science 2021-09-08 Enda Yu , Dezun Dong , Yemao Xu , Shuo Ouyang , Xiangke Liao

Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization

Large-scale distributed optimization is of great importance in various applications. For data-parallel based distributed learning, the inter-node gradient communication often becomes the performance bottleneck. In this paper, we propose the…

Computer Vision and Pattern Recognition · Computer Science 2018-06-22 Jiaxiang Wu , Weidong Huang , Junzhou Huang , Tong Zhang

Communication-efficient Variance-reduced Stochastic Gradient Descent

We consider the problem of communication efficient distributed optimization where multiple nodes exchange important algorithm information in every iteration to solve large problems. In particular, we focus on the stochastic variance-reduced…

Machine Learning · Computer Science 2020-03-16 Hossein S. Ghadikolaei , Sindri Magnusson

Distributed Delayed Stochastic Optimization

We analyze the convergence of gradient-based optimization algorithms that base their updates on delayed stochastic gradient information. The main application of our results is to the development of gradient-based distributed optimization…

Optimization and Control · Mathematics 2011-05-02 Alekh Agarwal , John C. Duchi

On the Utility of Gradient Compression in Distributed Training Systems

A rich body of prior work has highlighted the existence of communication bottlenecks in synchronous data-parallel training. To alleviate these bottlenecks, a long line of recent work proposes gradient and model compression methods. In this…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-07-01 Saurabh Agarwal , Hongyi Wang , Shivaram Venkataraman , Dimitris Papailiopoulos

Compressed Communication for Distributed Training: Adaptive Methods and System

Communication overhead severely hinders the scalability of distributed machine learning systems. Recently, there has been a growing interest in using gradient compression to reduce the communication overhead of the distributed training.…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-19 Yuchen Zhong , Cong Xie , Shuai Zheng , Haibin Lin

Accelerated Methods with Compressed Communications for Distributed Optimization Problems under Data Similarity

In recent years, as data and problem sizes have increased, distributed learning has become an essential tool for training high-performance models. However, the communication bottleneck, especially for high-dimensional data, is a challenge.…

Optimization and Control · Mathematics 2025-04-28 Dmitry Bylinkin , Aleksandr Beznosikov

Compression for Distributed Optimization and Timely Updates

The goal of this thesis is to study the compression problems arising in distributed computing systematically. In the first part of the thesis, we study gradient compression for distributed first-order optimization. We begin by establishing…

Information Theory · Computer Science 2023-01-12 Prathamesh Mayekar

Sparse Communication for Training Deep Networks

Synchronous stochastic gradient descent (SGD) is the most common method used for distributed training of deep learning models. In this algorithm, each worker shares its local gradients with others and updates the parameters using the…

Machine Learning · Computer Science 2020-09-22 Negar Foroutan Eghlidi , Martin Jaggi

On Biased Compression for Distributed Learning

In the last few years, various communication compression techniques have emerged as an indispensable tool helping to alleviate the communication bottleneck in distributed learning. However, despite the fact biased compressors often show…

Machine Learning · Computer Science 2024-01-17 Aleksandr Beznosikov , Samuel Horváth , Peter Richtárik , Mher Safaryan

Learned Gradient Compression for Distributed Deep Learning

Training deep neural networks on large datasets containing high-dimensional data requires a large amount of computation. A solution to this problem is data-parallel distributed training, where a model is replicated into several…

Machine Learning · Computer Science 2021-03-18 Lusine Abrahamyan , Yiming Chen , Giannis Bekoulis , Nikos Deligiannis

Optimal Gradient Compression for Distributed and Federated Learning

Communicating information, like gradient vectors, between computing nodes in distributed and federated learning is typically an unavoidable burden, resulting in scalability issues. Indeed, communication might be slow and costly. Recent…

Machine Learning · Computer Science 2020-10-08 Alyazeed Albasyoni , Mher Safaryan , Laurent Condat , Peter Richtárik

Federated Learning with Compression: Unified Analysis and Sharp Guarantees

In federated learning, communication cost is often a critical bottleneck to scale up distributed optimization algorithms to collaboratively learn a model from millions of devices with potentially unreliable or limited communication and…

Machine Learning · Computer Science 2020-11-24 Farzin Haddadpour , Mohammad Mahdi Kamani , Aryan Mokhtari , Mehrdad Mahdavi

Unbiased Single-scale and Multi-scale Quantizers for Distributed Optimization

Massive amounts of data have led to the training of large-scale machine learning models on a single worker inefficient. Distributed machine learning methods such as Parallel-SGD have received significant interest as a solution to tackle…

Machine Learning · Computer Science 2022-03-31 S Vineeth

Distributed Optimization with Quantized Gradient Descent

In this paper, we consider the unconstrained distributed optimization problem, in which the exchange of information in the network is captured by a directed graph topology, thus, nodes can only communicate with their neighbors.…

Systems and Control · Electrical Eng. & Systems 2023-12-07 Apostolos I. Rikos , Wei Jiang , Themistoklis Charalambous , Karl H. Johansson

Compressed Decentralized Momentum Stochastic Gradient Methods for Nonconvex Optimization

In this paper, we design two compressed decentralized algorithms for solving nonconvex stochastic optimization under two different scenarios. Both algorithms adopt a momentum technique to achieve fast convergence and a message-compression…

Machine Learning · Computer Science 2025-08-08 Wei Liu , Anweshit Panda , Ujwal Pandey , Christopher Brissette , Yikang Shen , George M. Slota , Naigang Wang , Jie Chen , Yangyang Xu

Improved Quantization Strategies for Managing Heavy-tailed Gradients in Distributed Learning

Gradient compression has surfaced as a key technique to address the challenge of communication efficiency in distributed learning. In distributed deep learning, however, it is observed that gradient distributions are heavy-tailed, with…

Machine Learning · Computer Science 2024-02-07 Guangfeng Yan , Tan Li , Yuanzhang Xiao , Hanxu Hou , Linqi Song

Convergence of Limited Communications Gradient Methods

Distributed optimization increasingly plays a central role in economical and sustainable operation of cyber-physical systems. Nevertheless, the complete potential of the technology has not yet been fully exploited in practice due to…

Optimization and Control · Mathematics 2017-10-24 Sindri Magnusson , Chinwendu Enyioha , Na Li , Carlo Fischione , Vahid Tarokh