Related papers: Tight Time Complexities in Parallel Stochastic Opt…

On the Optimal Time Complexities in Decentralized Stochastic Asynchronous Optimization

We consider the decentralized stochastic asynchronous optimization setup, where many workers asynchronously calculate stochastic gradients and asynchronously communicate with each other using edges in a multigraph. For both homogeneous and…

Optimization and Control · Mathematics 2024-11-05 Alexander Tyurin , Peter Richtárik

Ringmaster ASGD: The First Asynchronous SGD with Optimal Time Complexity

Asynchronous Stochastic Gradient Descent (Asynchronous SGD) is a cornerstone method for parallelizing learning in distributed machine learning. However, its performance suffers under arbitrarily heterogeneous computation times across…

Machine Learning · Computer Science 2025-06-04 Artavazd Maranjyan , Alexander Tyurin , Peter Richtárik

Ringleader ASGD: The First Asynchronous SGD with Optimal Time Complexity under Data Heterogeneity

Asynchronous stochastic gradient methods are central to scalable distributed optimization, particularly when devices differ in computational capabilities. Such settings arise naturally in federated learning, where training takes place on…

Optimization and Control · Mathematics 2026-02-20 Artavazd Maranjyan , Peter Richtárik

Do We Need Asynchronous SGD? On the Near-Optimality of Synchronous Methods

Modern distributed optimization methods mostly rely on traditional synchronous approaches, despite substantial recent progress in asynchronous optimization. We revisit Synchronous SGD and its robust variant, called $m$-Synchronous SGD, and…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-04 Grigory Begunov , Alexander Tyurin

First Provably Optimal Asynchronous SGD for Homogeneous and Heterogeneous Data

Artificial intelligence has advanced rapidly through large neural networks trained on massive datasets using thousands of GPUs or TPUs. Such training can occupy entire data centers for weeks and requires enormous computational and energy…

Optimization and Control · Mathematics 2026-01-07 Artavazd Maranjyan

Optimal Time Complexities of Parallel Stochastic Optimization Methods Under a Fixed Computation Model

Parallelization is a popular strategy for improving the performance of iterative algorithms. Optimization methods are no exception: design of efficient parallel optimization methods and tight analysis of their theoretical properties are…

Optimization and Control · Mathematics 2023-11-28 Alexander Tyurin , Peter Richtárik

Shadowheart SGD: Distributed Asynchronous SGD with Optimal Time Complexity Under Arbitrary Computation and Communication Heterogeneity

We consider nonconvex stochastic optimization problems in the asynchronous centralized distributed setup where the communication times from workers to a server can not be ignored, and the computation and communication times are potentially…

Optimization and Control · Mathematics 2024-11-05 Alexander Tyurin , Marta Pozzi , Ivan Ilin , Peter Richtárik

The Minimax Complexity of Distributed Optimization

In this thesis, I study the minimax oracle complexity of distributed stochastic optimization. First, I present the "graph oracle model", an extension of the classic oracle complexity framework that can be applied to study distributed…

Optimization and Control · Mathematics 2021-09-03 Blake Woodworth

Asynchronous Distributed Optimization with Stochastic Delays

We study asynchronous finite sum minimization in a distributed-data setting with a central parameter server. While asynchrony is well understood in parallel settings where the data is accessible by all machines -- e.g., modifications of…

Machine Learning · Computer Science 2021-03-11 Margalit Glasgow , Mary Wootters

On the Convergence Analysis of Asynchronous SGD for Solving Consistent Linear Systems

In the realm of big data and machine learning, data-parallel, distributed stochastic algorithms have drawn significant attention in the present days.~While the synchronous versions of these algorithms are well understood in terms of their…

Optimization and Control · Mathematics 2020-04-07 Atal Narayan Sahu , Aritra Dutta , Aashutosh Tiwari , Peter Richtárik

Rescaled Asynchronous SGD: Optimal Distributed Optimization under Data and System Heterogeneity

Asynchronous stochastic gradient descent (ASGD) is a standard way to exploit heterogeneous compute resources in distributed learning: instead of forcing fast workers to wait for slow ones, the server updates the model whenever a gradient…

Machine Learning · Computer Science 2026-05-14 Ammar Mahran , Artavazd Maranjyan , Peter Richtárik

Rennala MVR: Improved Time Complexity for Parallel Stochastic Optimization via Momentum-Based Variance Reduction

Large-scale machine learning models are trained on clusters of machines that exhibit heterogeneous performance due to hardware variability, network delays, and system-level instabilities. In such environments, time complexity rather than…

Optimization and Control · Mathematics 2026-05-12 Zhirayr Tovmasyan , Artavazd Maranjyan , Peter Richtárik

MindFlayer SGD: Efficient Parallel SGD in the Presence of Heterogeneous and Random Worker Compute Times

We investigate the problem of minimizing the expectation of smooth nonconvex functions in a distributed setting with multiple parallel workers that are able to compute stochastic gradients. A significant challenge in this context is the…

Optimization and Control · Mathematics 2025-06-16 Artavazd Maranjyan , Omar Shaikh Omar , Peter Richtárik

Optimality in Decentralized Optimization under Bandwidth Constraints

We consider a realistic decentralized setup with bandwidth-constrained communication and derive optimal time complexities for non-convex stochastic parallel and asynchronous optimization (up to logarithmic factors). We develop the…

Optimization and Control · Mathematics 2026-03-24 Alexander Tyurin

Advances in Asynchronous Parallel and Distributed Optimization

Motivated by large-scale optimization problems arising in the context of machine learning, there have been several advances in the study of asynchronous parallel and distributed optimization methods during the past decade. Asynchronous…

Machine Learning · Computer Science 2020-06-25 Mahmoud Assran , Arda Aytekin , Hamid Feyzmahdavian , Mikael Johansson , Michael Rabbat

Asynchronous Distributed Semi-Stochastic Gradient Optimization

With the recent proliferation of large-scale learning problems,there have been a lot of interest on distributed machine learning algorithms, particularly those that are based on stochastic gradient descent (SGD) and its variants. However,…

Machine Learning · Computer Science 2015-12-07 Ruiliang Zhang , Shuai Zheng , James T. Kwok

Continuous-time Models for Stochastic Optimization Algorithms

We propose new continuous-time formulations for first-order stochastic optimization algorithms such as mini-batch gradient descent and variance-reduced methods. We exploit these continuous-time models, together with simple Lyapunov analysis…

Optimization and Control · Mathematics 2020-03-12 Antonio Orvieto , Aurelien Lucchi

Make Workers Work Harder: Decoupled Asynchronous Proximal Stochastic Gradient Descent

Asynchronous parallel optimization algorithms for solving large-scale machine learning problems have drawn significant attention from academia to industry recently. This paper proposes a novel algorithm, decoupled asynchronous proximal…

Optimization and Control · Mathematics 2016-05-24 Yitan Li , Linli Xu , Xiaowei Zhong , Qing Ling

Distributed stochastic optimization with large delays

One of the most widely used methods for solving large-scale stochastic optimization problems is distributed asynchronous stochastic gradient descent (DASGD), a family of algorithms that result from parallelizing stochastic gradient descent…

Optimization and Control · Mathematics 2021-07-08 Zhengyuan Zhou , Panayotis Mertikopoulos , Nicholas Bambos , Peter W. Glynn , Yinyu Ye

On the Computation and Communication Complexity of Parallel SGD with Dynamic Batch Sizes for Stochastic Non-Convex Optimization

For SGD based distributed stochastic optimization, computation complexity, measured by the convergence rate in terms of the number of stochastic gradient calls, and communication complexity, measured by the number of inter-node…

Optimization and Control · Mathematics 2019-05-14 Hao Yu , Rong Jin