Related papers: Amortized Analysis on Asynchronous Gradient Descen…

Accelerating Ill-Conditioned Low-Rank Matrix Estimation via Scaled Gradient Descent

Low-rank matrix estimation is a canonical problem that finds numerous applications in signal processing, machine learning and imaging science. A popular approach in practice is to factorize the matrix into two compact low-rank factors, and…

Machine Learning · Computer Science 2021-06-16 Tian Tong , Cong Ma , Yuejie Chi

Differential Equations for Modeling Asynchronous Algorithms

Asynchronous stochastic gradient descent (ASGD) is a popular parallel optimization algorithm in machine learning. Most theoretical analysis on ASGD take a discrete view and prove upper bounds for their convergence rates. However, the…

Machine Learning · Statistics 2018-05-09 Li He , Qi Meng , Wei Chen , Zhi-Ming Ma , Tie-Yan Liu

Guided parallelized stochastic gradient descent for delay compensation

Stochastic gradient descent (SGD) algorithm and its variations have been effectively used to optimize neural network models. However, with the rapid growth of big data and deep learning, SGD is no longer the most suitable choice due to its…

Machine Learning · Computer Science 2024-02-13 Anuraganand Sharma

Adaptive Step-Size Methods for Compressed SGD

Compressed Stochastic Gradient Descent (SGD) algorithms have been recently proposed to address the communication bottleneck in distributed and decentralized optimization problems, such as those that arise in federated machine learning.…

Machine Learning · Statistics 2022-07-21 Adarsh M. Subramaniam , Akshayaa Magesh , Venugopal V. Veeravalli

A Note on the Gradient-Evaluation Sequence in Accelerated Gradient Methods

Nesterov's accelerated gradient descent method (AGD) is a seminal deterministic first-order method known to achieve the optimal order of iteration complexity for solving convex smooth optimization problems. Two distinct sequences of…

Optimization and Control · Mathematics 2026-03-10 Yan Wu , Yipeng Zhang , Lu Liu , Yuyuan Ouyang

Rescaled Asynchronous SGD: Optimal Distributed Optimization under Data and System Heterogeneity

Asynchronous stochastic gradient descent (ASGD) is a standard way to exploit heterogeneous compute resources in distributed learning: instead of forcing fast workers to wait for slow ones, the server updates the model whenever a gradient…

Machine Learning · Computer Science 2026-05-14 Ammar Mahran , Artavazd Maranjyan , Peter Richtárik

The Convergence of Stochastic Gradient Descent in Asynchronous Shared Memory

Stochastic Gradient Descent (SGD) is a fundamental algorithm in machine learning, representing the optimization backbone for training several classic models, from regression to neural networks. Given the recent practical focus on…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-06-25 Dan Alistarh , Christopher De Sa , Nikola Konstantinov

Asynchronous Stochastic Gradient Descent with Decoupled Backpropagation and Layer-Wise Updates

The increasing size of deep learning models has made distributed training across multiple devices essential. However, current methods such as distributed data-parallel training suffer from large communication and synchronization overheads…

Machine Learning · Computer Science 2025-02-10 Cabrel Teguemne Fokam , Khaleelulla Khan Nazeer , Lukas König , David Kappel , Anand Subramoney

Asynchronous Stochastic Gradient Descent with Delay Compensation

With the fast development of deep learning, it has become common to learn big neural networks using massive training data. Asynchronous Stochastic Gradient Descent (ASGD) is widely adopted to fulfill this task for its efficiency, which is,…

Machine Learning · Computer Science 2020-02-19 Shuxin Zheng , Qi Meng , Taifeng Wang , Wei Chen , Nenghai Yu , Zhi-Ming Ma , Tie-Yan Liu

Solving Convex Smooth Function Constrained Optimization Is Almost As Easy As Unconstrained Optimization

While Nesterov's Accelerated Gradient Descent (AGD) efficiently solves constrained problems when the constraint set $X \subseteq \mathbb{R}^n$ is simple and easy to project onto, it remains an open question whether function-constrained…

Optimization and Control · Mathematics 2025-12-02 Zhe Zhang , Guanghui Lan

Asynchronous Parallel Stochastic Gradient Descent - A Numeric Core for Scalable Distributed Machine Learning Algorithms

The implementation of a vast majority of machine learning (ML) algorithms boils down to solving a numerical optimization problem. In this context, Stochastic Gradient Descent (SGD) methods have long proven to provide good results, both in…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-10-06 Janis Keuper , Franz-Josef Pfreundt

A Unified Approach to Analyzing Asynchronous Coordinate Descent and Tatonnement

This paper concerns asynchrony in iterative processes, focusing on gradient descent and tatonnement, a fundamental price dynamic. Gradient descent is an important class of iterative algorithms for minimizing convex functions. Classically,…

Optimization and Control · Mathematics 2016-12-30 Yun Kuen Cheung , Richard Cole

Accelerated Gradient Descent for Faster Convergence with Minimal Overhead

In this paper, we present CT-AGD (Curvature-Tuned Accelerated Gradient Descent), an optimization method for non-convex optimization problems in deep learning training tasks. CT-AGD is a general boosting procedure that accelerates…

Machine Learning · Computer Science 2026-05-18 Manuel Graca , L. Miguel Silveira , Arlindo Oliveira , Frank Liu

Asynchronous SGD Beats Minibatch SGD Under Arbitrary Delays

The existing analysis of asynchronous stochastic gradient descent (SGD) degrades dramatically when any delay is large, giving the impression that performance depends primarily on the delay. On the contrary, we prove much better guarantees…

Optimization and Control · Mathematics 2023-04-21 Konstantin Mishchenko , Francis Bach , Mathieu Even , Blake Woodworth

Asynchronous Fully-Decentralized SGD in the Cluster-Based Model

This paper presents fault-tolerant asynchronous Stochastic Gradient Descent (SGD) algorithms. SGD is widely used for approximating the minimum of a cost function $Q$, as a core part of optimization and learning algorithms. Our algorithms…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-06-14 Hagit Attiya , Noa Schiller

MindTheStep-AsyncPSGD: Adaptive Asynchronous Parallel Stochastic Gradient Descent

Stochastic Gradient Descent (SGD) is very useful in optimization problems with high-dimensional non-convex target functions, and hence constitutes an important component of several Machine Learning and Data Analytics methods. Recently there…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-11-11 Karl Bäckström , Marina Papatriantafilou , Philippas Tsigas

AEGD: Adaptive Gradient Descent with Energy

We propose AEGD, a new algorithm for first-order gradient-based optimization of non-convex objective functions, based on a dynamically updated energy variable. The method is shown to be unconditionally energy stable, irrespective of the…

Optimization and Control · Mathematics 2021-10-04 Hailiang Liu , Xuping Tian

Stochastic Gradient Descent with Biased but Consistent Gradient Estimators

Stochastic gradient descent (SGD), which dates back to the 1950s, is one of the most popular and effective approaches for performing stochastic optimization. Research on SGD resurged recently in machine learning for optimizing convex loss…

Machine Learning · Computer Science 2019-12-24 Jie Chen , Ronny Luss

The Error-Feedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Communication

We analyze (stochastic) gradient descent (SGD) with delayed updates on smooth quasi-convex and non-convex functions and derive concise, non-asymptotic, convergence rates. We show that the rate of convergence in all cases consists of two…

Machine Learning · Computer Science 2021-06-17 Sebastian U. Stich , Sai Praneeth Karimireddy

AutoGD: Automatic Learning Rate Selection for Gradient Descent

The performance of gradient-based optimization methods, such as standard gradient descent (GD), greatly depends on the choice of learning rate. However, it can require a non-trivial amount of user tuning effort to select an appropriate…

Machine Learning · Computer Science 2025-10-14 Nikola Surjanovic , Alexandre Bouchard-Côté , Trevor Campbell