Related papers: Decentralized gradient methods: does topology matt…

Network Topology and Communication-Computation Tradeoffs in Decentralized Optimization

In decentralized optimization, nodes cooperate to minimize an overall objective function that is the sum (or average) of per-node private objective functions. Algorithms interleave local computations with communication among all or a subset…

Optimization and Control · Mathematics 2018-01-16 Angelia Nedić , Alex Olshevsky , Michael G. Rabbat

Decentralized Deep Learning using Momentum-Accelerated Consensus

We consider the problem of decentralized deep learning where multiple agents collaborate to learn from a distributed dataset. While there exist several decentralized deep learning approaches, the majority consider a central parameter-server…

Machine Learning · Computer Science 2020-12-01 Aditya Balu , Zhanhong Jiang , Sin Yong Tan , Chinmay Hedge , Young M Lee , Soumik Sarkar

Distributed two-time-scale methods over clustered networks

In this paper, we consider consensus problems over a network of nodes, where the network is divided into a number of clusters. We are interested in the case where the communication topology within each cluster is dense as compared to the…

Systems and Control · Electrical Eng. & Systems 2021-08-27 Thiem V. Pham , Thinh T. Doan , Dinh Hoa Nguyen

Beyond spectral gap: The role of the topology in decentralized learning

In data-parallel optimization of machine learning models, workers collaborate to improve their estimates of the model: more accurate gradients allow them to use larger learning rates and optimize faster. We consider the setting in which all…

Machine Learning · Computer Science 2022-11-09 Thijs Vogels , Hadrien Hendrikx , Martin Jaggi

Beyond spectral gap (extended): The role of the topology in decentralized learning

In data-parallel optimization of machine learning models, workers collaborate to improve their estimates of the model: more accurate gradients allow them to use larger learning rates and optimize faster. In the decentralized setting, in…

Machine Learning · Computer Science 2023-01-06 Thijs Vogels , Hadrien Hendrikx , Martin Jaggi

On the convergence rate of distributed gradient methods for finite-sum optimization under communication delays

Motivated by applications in machine learning and statistics, we study distributed optimization problems over a network of processors, where the goal is to optimize a global objective composed of a sum of local functions. In these problems,…

Optimization and Control · Mathematics 2019-05-14 Thinh T. Doan , Carolyn L. Beck , R. Srikant

Communication-Efficient Decentralized Learning with Sparsification and Adaptive Peer Selection

Distributed learning techniques such as federated learning have enabled multiple workers to train machine learning models together to reduce the overall training time. However, current distributed training algorithms (centralized or…

Machine Learning · Computer Science 2020-02-25 Zhenheng Tang , Shaohuai Shi , Xiaowen Chu

Learned Finite-Time Consensus for Distributed Optimization

Most algorithms for decentralized learning employ a consensus or diffusion mechanism to drive agents to a common solution of a global optimization problem. Generally this takes the form of linear averaging, at a rate of contraction…

Optimization and Control · Mathematics 2024-06-07 Aaron Fainman , Stefan Vlaski

Multi-consensus Decentralized Accelerated Gradient Descent

This paper considers the decentralized convex optimization problem, which has a wide range of applications in large-scale machine learning, sensor networks, and control theory. We propose novel algorithms that achieve optimal computation…

Machine Learning · Computer Science 2023-10-11 Haishan Ye , Luo Luo , Ziang Zhou , Tong Zhang

Straggler-Resilient Distributed Machine Learning with Dynamic Backup Workers

With the increasing demand for large-scale training of machine learning models, consensus-based distributed optimization methods have recently been advocated as alternatives to the popular parameter server framework. In this paradigm, each…

Machine Learning · Computer Science 2021-02-15 Guojun Xiong , Gang Yan , Rahul Singh , Jian Li

Sparse Communication for Training Deep Networks

Synchronous stochastic gradient descent (SGD) is the most common method used for distributed training of deep learning models. In this algorithm, each worker shares its local gradients with others and updates the parameters using the…

Machine Learning · Computer Science 2020-09-22 Negar Foroutan Eghlidi , Martin Jaggi

99% of Distributed Optimization is a Waste of Time: The Issue and How to Fix it

Many popular distributed optimization methods for training machine learning models fit the following template: a local gradient estimate is computed independently by each worker, then communicated to a master, which subsequently performs…

Machine Learning · Computer Science 2019-06-05 Konstantin Mishchenko , Filip Hanzely , Peter Richtárik

Linearly Convergent Algorithm with Variance Reduction for Distributed Stochastic Optimization

This paper considers a distributed stochastic strongly convex optimization, where agents connected over a network aim to cooperatively minimize the average of all agents' local cost functions. Due to the stochasticity of gradient estimation…

Optimization and Control · Mathematics 2020-02-17 Jinlong Lei , Peng Yi , Jie Chen , Yiguang Hong

Impacts of Network Topology on the Performance of a Distributed Algorithm Solving Linear Equations

Recently a distributed algorithm has been proposed for multi-agent networks to solve a system of linear algebraic equations, by assuming each agent only knows part of the system and is able to communicate with nearest neighbors to update…

Optimization and Control · Mathematics 2016-03-15 Hong-Tai Cao , Travis E. Gibson , Shaoshuai Mou , Yang-Yu Liu

Push--Pull with Device Sampling

We consider decentralized optimization problems in which a number of agents collaborate to minimize the average of their local functions by exchanging over an underlying communication graph. Specifically, we place ourselves in an…

Optimization and Control · Mathematics 2023-03-20 Yu-Guan Hsieh , Yassine Laguel , Franck Iutzeler , Jérôme Malick

Cooperative SGD with Dynamic Mixing Matrices

One of the most common methods to train machine learning algorithms today is the stochastic gradient descent (SGD). In a distributed setting, SGD-based algorithms have been shown to converge theoretically under specific circumstances. A…

Machine Learning · Computer Science 2025-08-22 Soumya Sarkar , Shweta Jain

Decentralized Consensus Algorithm with Delayed and Stochastic Gradients

We analyze the convergence of decentralized consensus algorithm with delayed gradient information across the network. The nodes in the network privately hold parts of the objective function and collaboratively solve for the consensus…

Optimization and Control · Mathematics 2018-01-17 Benjamin Sirb , Xiaojing Ye

Is Consensus Acceleration Possible in Decentralized Optimization over Slowly Time-Varying Networks?

We consider decentralized optimization problems where one aims to minimize a sum of convex smooth objective functions distributed between nodes in the network. The links in the network can change from time to time. For the setting when the…

Optimization and Control · Mathematics 2023-01-30 Dmitriy Metelev , Alexander Rogozin , Dmitry Kovalev , Alexander Gasnikov

A Distributed Optimization Algorithm over Time-Varying Graphs with Efficient Gradient Evaluations

We propose an algorithm for distributed optimization over time-varying communication networks. Our algorithm uses an optimized ratio between the number of rounds of communication and gradient evaluations to achieve fast convergence. The…

Optimization and Control · Mathematics 2020-01-08 Bryan Van Scoy , Laurent Lessard

Distributed Optimization, Averaging via ADMM, and Network Topology

There has been an increasing necessity for scalable optimization methods, especially due to the explosion in the size of datasets and model complexity in modern machine learning applications. Scalable solvers often distribute the…

Optimization and Control · Mathematics 2020-09-08 Guilherme França , José Bento