Related papers: Accumulated Gradient Normalization

Faster Asynchronous SGD

Asynchronous distributed stochastic gradient descent methods have trouble converging because of stale gradients. A gradient update sent to a parameter server by a client is stale if the parameters used to calculate that gradient have since…

Machine Learning · Statistics 2016-01-18 Augustus Odena

Distributed Delayed Stochastic Optimization

We analyze the convergence of gradient-based optimization algorithms that base their updates on delayed stochastic gradient information. The main application of our results is to the development of gradient-based distributed optimization…

Optimization and Control · Mathematics 2011-05-02 Alekh Agarwal , John C. Duchi

Optimization Trade-offs in Asynchronous Federated Learning: A Stochastic Networks Approach

Synchronous federated learning scales poorly due to the straggler effect. Asynchronous algorithms increase the update throughput by processing updates upon arrival, but they introduce two fundamental challenges: gradient staleness, which…

Machine Learning · Computer Science 2026-03-30 Abdelkrim Alahyane , Céline Comte , Matthieu Jonckheere

Accelerated Alternating Direction Method of Multipliers Gradient Tracking for Distributed Optimization

This paper presents a novel accelerated distributed algorithm for unconstrained consensus optimization over static undirected networks. The proposed algorithm combines the benefits of acceleration from momentum, the robustness of the…

Optimization and Control · Mathematics 2024-05-15 Eduardo Sebastián , Mauro Franceschelli , Andrea Gasparri , Eduardo Montijano , Carlos Sagüés

Straggler-Robust Distributed Optimization with the Parameter Server Utilizing Coded Gradient

Optimization in distributed networks plays a central role in almost all distributed machine learning problems. In principle, the use of distributed task allocation has reduced the computational time, allowing better response rates and…

Optimization and Control · Mathematics 2020-07-28 Elie Atallah , Nazanin Rahnavard , Chinwendu Enyioha

Analysis and Implementation of an Asynchronous Optimization Algorithm for the Parameter Server

This paper presents an asynchronous incremental aggregated gradient algorithm and its implementation in a parameter server framework for solving regularized optimization problems. The algorithm can handle both general convex (possibly…

Optimization and Control · Mathematics 2016-10-19 Arda Aytekin , Hamid Reza Feyzmahdavian , Mikael Johansson

ADMM-Tracking Gradient for Distributed Optimization over Asynchronous and Unreliable Networks

In this paper, we propose a novel distributed algorithm for consensus optimization over networks and a robust extension tailored to deal with asynchronous agents and packet losses. Indeed, to robustly achieve dynamic consensus on the…

Optimization and Control · Mathematics 2025-09-04 Guido Carnevale , Nicola Bastianello , Giuseppe Notarstefano , Ruggero Carli

Distributed SGD Generalizes Well Under Asynchrony

The performance of fully synchronized distributed systems has faced a bottleneck due to the big data trend, under which asynchronous distributed systems are becoming a major popularity due to their powerful scalability. In this paper, we…

Machine Learning · Statistics 2019-10-01 Jayanth Regatti , Gaurav Tendolkar , Yi Zhou , Abhishek Gupta , Yingbin Liang

A General Distributed Dual Coordinate Optimization Framework for Regularized Loss Minimization

In modern large-scale machine learning applications, the training data are often partitioned and stored on multiple machines. It is customary to employ the "data parallelism" approach, where the aggregated training loss is minimized without…

Machine Learning · Computer Science 2017-08-28 Shun Zheng , Jialei Wang , Fen Xia , Wei Xu , Tong Zhang

Distributed stochastic optimization with large delays

One of the most widely used methods for solving large-scale stochastic optimization problems is distributed asynchronous stochastic gradient descent (DASGD), a family of algorithms that result from parallelizing stochastic gradient descent…

Optimization and Control · Mathematics 2021-07-08 Zhengyuan Zhou , Panayotis Mertikopoulos , Nicholas Bambos , Peter W. Glynn , Yinyu Ye

Straggler-Resilient Distributed Machine Learning with Dynamic Backup Workers

With the increasing demand for large-scale training of machine learning models, consensus-based distributed optimization methods have recently been advocated as alternatives to the popular parameter server framework. In this paradigm, each…

Machine Learning · Computer Science 2021-02-15 Guojun Xiong , Gang Yan , Rahul Singh , Jian Li

99% of Distributed Optimization is a Waste of Time: The Issue and How to Fix it

Many popular distributed optimization methods for training machine learning models fit the following template: a local gradient estimate is computed independently by each worker, then communicated to a master, which subsequently performs…

Machine Learning · Computer Science 2019-06-05 Konstantin Mishchenko , Filip Hanzely , Peter Richtárik

Straggler-Robust Distributed Optimization in Parameter-Server Networks

Optimization in distributed networks plays a central role in almost all distributed machine learning problems. In principle, the use of distributed task allocation has reduced the computational time, allowing better response rates and…

Optimization and Control · Mathematics 2021-08-23 Elie Atallah , Nazanin Rahnavard , Chinwendu Enyioha

Advances in Asynchronous Parallel and Distributed Optimization

Motivated by large-scale optimization problems arising in the context of machine learning, there have been several advances in the study of asynchronous parallel and distributed optimization methods during the past decade. Asynchronous…

Machine Learning · Computer Science 2020-06-25 Mahmoud Assran , Arda Aytekin , Hamid Feyzmahdavian , Mikael Johansson , Michael Rabbat

Adaptive Sequential Optimization with Applications to Machine Learning

A framework is introduced for solving a sequence of slowly changing optimization problems, including those arising in regression and classification applications, using optimization algorithms such as stochastic gradient descent (SGD). The…

Machine Learning · Computer Science 2015-09-25 Craig Wilson , Venugopal V. Veeravalli

Hybrid Decentralized Optimization: Leveraging Both First- and Zeroth-Order Optimizers for Faster Convergence

Distributed optimization is the standard way of speeding up machine learning training, and most of the research in the area focuses on distributed first-order, gradient-based methods. Yet, there are settings where some…

Machine Learning · Computer Science 2025-11-03 Matin Ansaripour , Shayan Talaei , Giorgi Nadiradze , Dan Alistarh

Adaptive Sequential Stochastic Optimization

A framework is introduced for sequentially solving convex stochastic minimization problems, where the objective functions change slowly, in the sense that the distance between successive minimizers is bounded. The minimization problems are…

Optimization and Control · Mathematics 2018-03-12 Craig Wilson , Venugopal Veeravalli , Angelia Nedich

Slow and Stale Gradients Can Win the Race

Distributed Stochastic Gradient Descent (SGD) when run in a synchronous manner, suffers from delays in runtime as it waits for the slowest workers (stragglers). Asynchronous methods can alleviate stragglers, but cause gradient staleness…

Machine Learning · Statistics 2020-03-25 Sanghamitra Dutta , Jianyu Wang , Gauri Joshi

Bringing Order to Asynchronous SGD: Towards Optimality under Data-Dependent Delays with Momentum

Asynchronous stochastic gradient descent (SGD) enables scalable distributed training but suffers from gradient staleness. Existing mitigation strategies, such as delay-adaptive learning rates and staleness-aware filtering, typically…

Machine Learning · Computer Science 2026-05-15 Tehila Dahan , Roie Reshef , Sharon Goldstein , Kfir Y. Levy

Make Workers Work Harder: Decoupled Asynchronous Proximal Stochastic Gradient Descent

Asynchronous parallel optimization algorithms for solving large-scale machine learning problems have drawn significant attention from academia to industry recently. This paper proposes a novel algorithm, decoupled asynchronous proximal…

Optimization and Control · Mathematics 2016-05-24 Yitan Li , Linli Xu , Xiaowei Zhong , Qing Ling