Related papers: Differential Equations for Modeling Asynchronous A…

Guided parallelized stochastic gradient descent for delay compensation

Stochastic gradient descent (SGD) algorithm and its variations have been effectively used to optimize neural network models. However, with the rapid growth of big data and deep learning, SGD is no longer the most suitable choice due to its…

Machine Learning · Computer Science 2024-02-13 Anuraganand Sharma

Asynchronous Parallel Stochastic Gradient Descent - A Numeric Core for Scalable Distributed Machine Learning Algorithms

The implementation of a vast majority of machine learning (ML) algorithms boils down to solving a numerical optimization problem. In this context, Stochastic Gradient Descent (SGD) methods have long proven to provide good results, both in…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-10-06 Janis Keuper , Franz-Josef Pfreundt

On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants

We study optimization algorithms based on variance reduction for stochastic gradient descent (SGD). Remarkable recent progress has been made in this direction through development of algorithms like SAG, SVRG, SAGA. These algorithms have…

Machine Learning · Computer Science 2016-01-26 Sashank J. Reddi , Ahmed Hefny , Suvrit Sra , Barnabás Póczos , Alex Smola

Rescaled Asynchronous SGD: Optimal Distributed Optimization under Data and System Heterogeneity

Asynchronous stochastic gradient descent (ASGD) is a standard way to exploit heterogeneous compute resources in distributed learning: instead of forcing fast workers to wait for slow ones, the server updates the model whenever a gradient…

Machine Learning · Computer Science 2026-05-14 Ammar Mahran , Artavazd Maranjyan , Peter Richtárik

The Optimality of (Accelerated) SGD for High-Dimensional Quadratic Optimization

Stochastic gradient descent (SGD) is a widely used algorithm in machine learning, particularly for neural network training. Recent studies on SGD for canonical quadratic optimization or linear regression show it attains well generalization…

Machine Learning · Computer Science 2024-09-17 Haihan Zhang , Yuanshi Liu , Qianwen Chen , Cong Fang

Stochastic modified equations for the asynchronous stochastic gradient descent

We propose a stochastic modified equations (SME) for modeling the asynchronous stochastic gradient descent (ASGD) algorithms. The resulting SME of Langevin type extracts more information about the ASGD dynamics and elucidates the…

Machine Learning · Statistics 2020-03-04 Jing An , Jianfeng Lu , Lexing Ying

Dual-Delayed Asynchronous SGD for Arbitrarily Heterogeneous Data

We consider the distributed learning problem with data dispersed across multiple workers under the orchestration of a central server. Asynchronous Stochastic Gradient Descent (SGD) has been widely explored in such a setting to reduce the…

Machine Learning · Computer Science 2024-05-28 Xiaolu Wang , Yuchang Sun , Hoi-To Wai , Jun Zhang

Convergence Analysis of Decentralized ASGD

Over the last decades, Stochastic Gradient Descent (SGD) has been intensively studied by the Machine Learning community. Despite its versatility and excellent performance, the optimization of large models via SGD still is a time-consuming…

Machine Learning · Computer Science 2025-12-01 Mauro DL Tosi , Martin Theobald

Stochastic gradient with least-squares control variates

The stochastic gradient descent (SGD) method is a widely used approach for solving stochastic optimization problems, but its convergence is typically slow. Existing variance reduction techniques, such as SAGA, improve convergence by…

Optimization and Control · Mathematics 2025-11-21 Fabio Nobile , Matteo Raviola , Nathan Schaeffer

A Sharp Convergence Rate for the Asynchronous Stochastic Gradient Descent

We give a sharp convergence rate for the asynchronous stochastic gradient descent (ASGD) algorithms when the loss function is a perturbed quadratic function based on the stochastic modified equations introduced in [An et al. Stochastic…

Numerical Analysis · Mathematics 2020-01-27 Yuhua Zhu , Lexing Ying

Stochastic Gradient Descent Revisited

Stochastic gradient descent (SGD) has been a go-to algorithm for nonconvex stochastic optimization problems arising in machine learning. Its theory however often requires a strong framework to guarantee convergence properties. We hereby…

Optimization and Control · Mathematics 2025-03-11 Azar Louzi

On the Convergence of mSGD and AdaGrad for Stochastic Optimization

As one of the most fundamental stochastic optimization algorithms, stochastic gradient descent (SGD) has been intensively developed and extensively applied in machine learning in the past decade. There have been some modified SGD-type…

Machine Learning · Computer Science 2022-01-28 Ruinan Jin , Yu Xing , Xingkang He

Asynchronous SGD Beats Minibatch SGD Under Arbitrary Delays

The existing analysis of asynchronous stochastic gradient descent (SGD) degrades dramatically when any delay is large, giving the impression that performance depends primarily on the delay. On the contrary, we prove much better guarantees…

Optimization and Control · Mathematics 2023-04-21 Konstantin Mishchenko , Francis Bach , Mathieu Even , Blake Woodworth

Scaling up Stochastic Gradient Descent for Non-convex Optimisation

Stochastic gradient descent (SGD) is a widely adopted iterative method for optimizing differentiable objective functions. In this paper, we propose and discuss a novel approach to scale up SGD in applications involving non-convex functions…

Machine Learning · Statistics 2022-10-07 Saad Mohamad , Hamad Alamri , Abdelhamid Bouchachia

Optimal Adaptive and Accelerated Stochastic Gradient Descent

Stochastic gradient descent (\textsc{Sgd}) methods are the most powerful optimization tools in training machine learning and deep learning models. Moreover, acceleration (a.k.a. momentum) methods and diagonal scaling (a.k.a. adaptive…

Machine Learning · Statistics 2018-10-02 Qi Deng , Yi Cheng , Guanghui Lan

The Convergence of Stochastic Gradient Descent in Asynchronous Shared Memory

Stochastic Gradient Descent (SGD) is a fundamental algorithm in machine learning, representing the optimization backbone for training several classic models, from regression to neural networks. Given the recent practical focus on…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-06-25 Dan Alistarh , Christopher De Sa , Nikola Konstantinov

Asynchronous Stochastic Gradient Descent with Decoupled Backpropagation and Layer-Wise Updates

The increasing size of deep learning models has made distributed training across multiple devices essential. However, current methods such as distributed data-parallel training suffer from large communication and synchronization overheads…

Machine Learning · Computer Science 2025-02-10 Cabrel Teguemne Fokam , Khaleelulla Khan Nazeer , Lukas König , David Kappel , Anand Subramoney

First and Second Order Approximations to Stochastic Gradient Descent Methods with Momentum Terms

Stochastic Gradient Descent (SGD) methods see many uses in optimization problems. Modifications to the algorithm, such as momentum-based SGD methods have been known to produce better results in certain cases. Much of this, however, is due…

Machine Learning · Computer Science 2025-04-22 Eric Lu

Is Stochastic Gradient Descent Effective? A PDE Perspective on Machine Learning processes

In this paper we analyze the behaviour of the stochastic gradient descent (SGD), a widely used method in supervised learning for optimizing neural network weights via a minimization of non-convex loss functions. Since the pioneering work of…

Machine Learning · Computer Science 2025-05-13 Davide Barbieri , Matteo Bonforte , Peio Ibarrondo

Asymptotic Analysis of Conditioned Stochastic Gradient Descent

In this paper, we investigate a general class of stochastic gradient descent (SGD) algorithms, called Conditioned SGD, based on a preconditioning of the gradient direction. Using a discrete-time approach with martingale tools, we establish…

Statistics Theory · Mathematics 2023-10-17 Rémi Leluc , François Portier