Related papers: Improved asynchronous parallel optimization analys…

ASAGA: Asynchronous Parallel SAGA

We describe ASAGA, an asynchronous parallel version of the incremental gradient algorithm SAGA that enjoys fast linear convergence rates. Through a novel perspective, we revisit and clarify a subtle but important technical issue present in…

Optimization and Control · Mathematics 2017-11-09 Rémi Leblond , Fabian Pedregosa , Simon Lacoste-Julien

Perturbed Iterate Analysis for Asynchronous Stochastic Optimization

We introduce and analyze stochastic optimization methods where the input to each gradient update is perturbed by bounded noise. We show that this framework forms the basis of a unified approach to analyze asynchronous implementations of…

Machine Learning · Statistics 2016-03-29 Horia Mania , Xinghao Pan , Dimitris Papailiopoulos , Benjamin Recht , Kannan Ramchandran , Michael I. Jordan

Advances in Asynchronous Parallel and Distributed Optimization

Motivated by large-scale optimization problems arising in the context of machine learning, there have been several advances in the study of asynchronous parallel and distributed optimization methods during the past decade. Asynchronous…

Machine Learning · Computer Science 2020-06-25 Mahmoud Assran , Arda Aytekin , Hamid Feyzmahdavian , Mikael Johansson , Michael Rabbat

Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization

Due to their simplicity and excellent performance, parallel asynchronous variants of stochastic gradient descent have become popular methods to solve a wide range of large-scale optimization problems on multi-core architectures. Yet,…

Optimization and Control · Mathematics 2017-11-07 Fabian Pedregosa , Rémi Leblond , Simon Lacoste-Julien

On the Convergence Analysis of Asynchronous SGD for Solving Consistent Linear Systems

In the realm of big data and machine learning, data-parallel, distributed stochastic algorithms have drawn significant attention in the present days.~While the synchronous versions of these algorithms are well understood in terms of their…

Optimization and Control · Mathematics 2020-04-07 Atal Narayan Sahu , Aritra Dutta , Aashutosh Tiwari , Peter Richtárik

Asynchronous Distributed Optimization with Stochastic Delays

We study asynchronous finite sum minimization in a distributed-data setting with a central parameter server. While asynchrony is well understood in parallel settings where the data is accessible by all machines -- e.g., modifications of…

Machine Learning · Computer Science 2021-03-11 Margalit Glasgow , Mary Wootters

Parallel and distributed asynchronous adaptive stochastic gradient methods

Stochastic gradient methods (SGMs) are the predominant approaches to train deep learning models. The adaptive versions (e.g., Adam and AMSGrad) have been extensively used in practice, partly because they achieve faster convergence than the…

Optimization and Control · Mathematics 2022-04-14 Yangyang Xu , Yibo Xu , Yonggui Yan , Colin Sutcher-Shepard , Leopold Grinberg , Jie Chen

First Provably Optimal Asynchronous SGD for Homogeneous and Heterogeneous Data

Artificial intelligence has advanced rapidly through large neural networks trained on massive datasets using thousands of GPUs or TPUs. Such training can occupy entire data centers for weeks and requires enormous computational and energy…

Optimization and Control · Mathematics 2026-01-07 Artavazd Maranjyan

Asynchronous Iterations in Optimization: New Sequence Results and Sharper Algorithmic Guarantees

We introduce novel convergence results for asynchronous iterations that appear in the analysis of parallel and distributed optimization algorithms. The results are simple to apply and give explicit estimates for how the degree of asynchrony…

Optimization and Control · Mathematics 2023-04-04 Hamid Reza Feyzmahdavian , Mikael Johansson

Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization

Asynchronous parallel implementations of stochastic gradient (SG) have been broadly used in solving deep neural network and received many successes in practice recently. However, existing theories cannot explain their convergence and…

Optimization and Control · Mathematics 2019-04-22 Xiangru Lian , Yijun Huang , Yuncheng Li , Ji Liu

Accelerating Perturbed Stochastic Iterates in Asynchronous Lock-Free Optimization

We show that stochastic acceleration can be achieved under the perturbed iterate framework (Mania et al., 2017) in asynchronous lock-free optimization, which leads to the optimal incremental gradient complexity for finite-sum objectives. We…

Optimization and Control · Mathematics 2021-10-01 Kaiwen Zhou , Anthony Man-Cho So , James Cheng

Fast Asynchronous Parallel Stochastic Gradient Decent

Stochastic gradient descent~(SGD) and its variants have become more and more popular in machine learning due to their efficiency and effectiveness. To handle large-scale problems, researchers have recently proposed several parallel SGD…

Machine Learning · Statistics 2015-08-25 Shen-Yi Zhao , Wu-Jun Li

Asynchronous Parallel Stochastic Gradient Descent - A Numeric Core for Scalable Distributed Machine Learning Algorithms

The implementation of a vast majority of machine learning (ML) algorithms boils down to solving a numerical optimization problem. In this context, Stochastic Gradient Descent (SGD) methods have long proven to provide good results, both in…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-10-06 Janis Keuper , Franz-Josef Pfreundt

On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants

We study optimization algorithms based on variance reduction for stochastic gradient descent (SGD). Remarkable recent progress has been made in this direction through development of algorithms like SAG, SVRG, SAGA. These algorithms have…

Machine Learning · Computer Science 2016-01-26 Sashank J. Reddi , Ahmed Hefny , Suvrit Sra , Barnabás Póczos , Alex Smola

Ringleader ASGD: The First Asynchronous SGD with Optimal Time Complexity under Data Heterogeneity

Asynchronous stochastic gradient methods are central to scalable distributed optimization, particularly when devices differ in computational capabilities. Such settings arise naturally in federated learning, where training takes place on…

Optimization and Control · Mathematics 2026-02-20 Artavazd Maranjyan , Peter Richtárik

ASAP: Asynchronous Approximate Data-Parallel Computation

Emerging workloads, such as graph processing and machine learning are approximate because of the scale of data involved and the stochastic nature of the underlying algorithms. These algorithms are often distributed over multiple machines…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-12-28 Asim Kadav , Erik Kruus

Optimal convergence rates of totally asynchronous optimization

Asynchronous optimization algorithms are at the core of modern machine learning and resource allocation systems. However, most convergence results consider bounded information delays and several important algorithms lack guarantees when…

Optimization and Control · Mathematics 2022-03-10 Xuyang Wu , Sindri Magnusson , Hamid Reza Feyzmahdavian , Mikael Johansson

An Analysis of Asynchronous Stochastic Accelerated Coordinate Descent

Gradient descent, and coordinate descent in particular, are core tools in machine learning and elsewhere. Large problem instances are common. To help solve them, two orthogonal approaches are known: acceleration and parallelism. In this…

Optimization and Control · Mathematics 2018-08-16 Richard Cole , Yixin Tao

On Unbounded Delays in Asynchronous Parallel Fixed-Point Algorithms

The need for scalable numerical solutions has motivated the development of asynchronous parallel algorithms, where a set of nodes run in parallel with little or no synchronization, thus computing with delayed information. This paper studies…

Optimization and Control · Mathematics 2017-08-18 Robert Hannah , Wotao Yin

On the Convergence of Asynchronous Parallel Iteration with Unbounded Delays

Recent years have witnessed the surge of asynchronous parallel (async-parallel) iterative algorithms due to problems involving very large-scale data and a large number of decision variables. Because of asynchrony, the iterates are computed…

Optimization and Control · Mathematics 2021-02-05 Zhimin Peng , Yangyang Xu , Ming Yan , Wotao Yin