Related papers: Gradient Sparsification for Communication-Efficien…

Communication-Efficient Distributed Learning via Sparse and Adaptive Stochastic Gradient

Gradient-based optimization methods implemented on distributed computing architectures are increasingly used to tackle large-scale machine learning applications. A key bottleneck in such distributed systems is the high communication…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-06-11 Xiaoge Deng , Dongsheng Li , Tao Sun , Xicheng Lu

Linearly Convergent Algorithm with Variance Reduction for Distributed Stochastic Optimization

This paper considers a distributed stochastic strongly convex optimization, where agents connected over a network aim to cooperatively minimize the average of all agents' local cost functions. Due to the stochasticity of gradient estimation…

Optimization and Control · Mathematics 2020-02-17 Jinlong Lei , Peng Yi , Jie Chen , Yiguang Hong

Communication-efficient Variance-reduced Stochastic Gradient Descent

We consider the problem of communication efficient distributed optimization where multiple nodes exchange important algorithm information in every iteration to solve large problems. In particular, we focus on the stochastic variance-reduced…

Machine Learning · Computer Science 2020-03-16 Hossein S. Ghadikolaei , Sindri Magnusson

Sparse Communication for Training Deep Networks

Synchronous stochastic gradient descent (SGD) is the most common method used for distributed training of deep learning models. In this algorithm, each worker shares its local gradients with others and updates the parameters using the…

Machine Learning · Computer Science 2020-09-22 Negar Foroutan Eghlidi , Martin Jaggi

Sparsified SGD with Memory

Huge scale machine learning problems are nowadays tackled by distributed optimization algorithms, i.e. algorithms that leverage the compute power of many devices for training. The communication overhead is a key bottleneck that hinders…

Machine Learning · Computer Science 2018-11-30 Sebastian U. Stich , Jean-Baptiste Cordonnier , Martin Jaggi

Communication-Efficient Approximate Gradient Coding for Distributed Learning in Heterogeneous Systems

We propose a communication-efficient optimally structured gradient coding scheme to jointly address straggler resilience and communication efficiency in heterogeneous distributed learning. By establishing a unified framework that…

Systems and Control · Electrical Eng. & Systems 2026-05-18 Heekang Song , Wan Choi

Stochastic Iterative Hard Thresholding for Graph-structured Sparsity Optimization

Stochastic optimization algorithms update models with cheap per-iteration costs sequentially, which makes them amenable for large-scale data analysis. Such algorithms have been widely studied for structured sparse models where the sparsity…

Machine Learning · Computer Science 2019-05-10 Baojian Zhou , Feng Chen , Yiming Ying

Distributed Learning with Sparse Communications by Identification

In distributed optimization for large-scale learning, a major performance limitation comes from the communications between the different entities. When computations are performed by workers on local data while a coordinator machine…

Optimization and Control · Mathematics 2020-06-26 Dmitry Grishchenko , Franck Iutzeler , Jérôme Malick , Massih-Reza Amini

A Distributed Optimization Algorithm over Time-Varying Graphs with Efficient Gradient Evaluations

We propose an algorithm for distributed optimization over time-varying communication networks. Our algorithm uses an optimized ratio between the number of rounds of communication and gradient evaluations to achieve fast convergence. The…

Optimization and Control · Mathematics 2020-01-08 Bryan Van Scoy , Laurent Lessard

Distributed Stochastic Variance Reduced Gradient Methods and A Lower Bound for Communication Complexity

We study distributed optimization algorithms for minimizing the average of convex functions. The applications include empirical risk minimization problems in statistical machine learning where the datasets are large and have to be stored on…

Optimization and Control · Mathematics 2016-01-07 Jason D. Lee , Qihang Lin , Tengyu Ma , Tianbao Yang

Gradient Sparification for Asynchronous Distributed Training

Modern large scale machine learning applications require stochastic optimization algorithms to be implemented on distributed computational architectures. A key bottleneck is the communication overhead for exchanging information, such as…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-10-25 Zijie Yan

Distributed Stochastic Approximation for Solving Network Optimization Problems Under Random Quantization

We study distributed optimization problems over a network when the communication between the nodes is constrained, and so information that is exchanged between the nodes must be quantized. This imperfect communication poses a fundamental…

Optimization and Control · Mathematics 2018-10-30 Thinh T. Doan , Siva Theja Maguluri , Justin Romberg

Communication-efficient Algorithm for Distributed Sparse Learning via Two-way Truncation

We propose a communicationally and computationally efficient algorithm for high-dimensional distributed sparse learning. At each iteration, local machines compute the gradient on local data and the master machine solves one shifted $l_1$…

Machine Learning · Statistics 2017-09-12 Jineng Ren , Jarvis Haupt

The Convergence of Sparsified Gradient Methods

Distributed training of massive machine learning models, in particular deep neural networks, via Stochastic Gradient Descent (SGD) is becoming commonplace. Several families of communication-reduction methods, such as quantization,…

Machine Learning · Computer Science 2018-09-28 Dan Alistarh , Torsten Hoefler , Mikael Johansson , Sarit Khirirat , Nikola Konstantinov , Cédric Renggli

Adaptive Top-K in SGD for Communication-Efficient Distributed Learning

Distributed stochastic gradient descent (SGD) with gradient compression has become a popular communication-efficient solution for accelerating distributed learning. One commonly used method for gradient compression is Top-K sparsification,…

Machine Learning · Computer Science 2023-09-12 Mengzhe Ruan , Guangfeng Yan , Yuanzhang Xiao , Linqi Song , Weitao Xu

The Min-Max Complexity of Distributed Stochastic Convex Optimization with Intermittent Communication

We resolve the min-max complexity of distributed stochastic convex optimization (up to a log factor) in the intermittent communication setting, where $M$ machines work in parallel over the course of $R$ rounds of communication to optimize…

Machine Learning · Computer Science 2021-08-06 Blake Woodworth , Brian Bullins , Ohad Shamir , Nathan Srebro

Communication-Efficient Distributed SGD with Compressed Sensing

We consider large scale distributed optimization over a set of edge devices connected to a central server, where the limited communication bandwidth between the server and edge devices imposes a significant bottleneck for the optimization…

Optimization and Control · Mathematics 2021-12-28 Yujie Tang , Vikram Ramanathan , Junshan Zhang , Na Li

Stochastic, Distributed and Federated Optimization for Machine Learning

We study optimization algorithms for the finite sum problems frequently arising in machine learning applications. First, we propose novel variants of stochastic gradient descent with a variance reduction property that enables linear…

Machine Learning · Computer Science 2017-07-06 Jakub Konečný

On the Communication Complexity of Decentralized Stochastic Bilevel Optimization

Stochastic bilevel optimization finds widespread applications in machine learning, including meta-learning, hyperparameter optimization, and neural architecture search. To extend stochastic bilevel optimization to distributed data, several…

Machine Learning · Computer Science 2026-05-26 Yihan Zhang , My T. Thai , Jie Wu , Hongchang Gao

Gradient-Free Distributed Optimization with Exact Convergence

In this paper, a gradient-free distributed algorithm is introduced to solve a set constrained optimization problem under a directed communication network. Specifically, at each time-step, the agents locally compute a so-called…

Optimization and Control · Mathematics 2021-09-06 Yipeng Pang , Guoqiang Hu