Related papers: Strong error analysis for stochastic gradient desc…

Weak error analysis for stochastic gradient descent optimization algorithms

Stochastic gradient descent (SGD) type optimization schemes are fundamental ingredients in a large number of machine learning based algorithms. In particular, SGD type optimization schemes are frequently employed in applications involving…

Numerical Analysis · Mathematics 2020-07-22 Aritz Bercher , Lukas Gonon , Arnulf Jentzen , Diyora Salimova

Lower error bounds for the stochastic gradient descent optimization algorithm: Sharp convergence rates for slowly and fast decaying learning rates

The stochastic gradient descent (SGD) optimization algorithm plays a central role in a series of machine learning applications. The scientific literature provides a vast amount of upper error bounds for the SGD method. Much less attention…

Numerical Analysis · Mathematics 2020-10-05 Arnulf Jentzen , Philippe von Wurstemberger

Stochastic Gradient Descent Revisited

Stochastic gradient descent (SGD) has been a go-to algorithm for nonconvex stochastic optimization problems arising in machine learning. Its theory however often requires a strong framework to guarantee convergence properties. We hereby…

Optimization and Control · Mathematics 2025-03-11 Azar Louzi

When Does Stochastic Gradient Algorithm Work Well?

In this paper, we consider a general stochastic optimization problem which is often at the core of supervised learning, such as deep learning and linear classification. We consider a standard stochastic gradient descent (SGD) method with a…

Machine Learning · Statistics 2018-12-27 Lam M. Nguyen , Nam H. Nguyen , Dzung T. Phan , Jayant R. Kalagnanam , Katya Scheinberg

Statistical Inference for Model Parameters in Stochastic Gradient Descent

The stochastic gradient descent (SGD) algorithm has been widely used in statistical estimation for large-scale data due to its computational and memory efficiency. While most existing works focus on the convergence of the objective function…

Machine Learning · Statistics 2023-11-02 Xi Chen , Jason D. Lee , Xin T. Tong , Yichen Zhang

Bias-Optimal Bounds for SGD: A Computer-Aided Lyapunov Analysis

The non-asymptotic analysis of Stochastic Gradient Descent (SGD) typically yields bounds that decompose into a bias term and a variance term. In this work, we focus on the bias component and study the extent to which SGD can match the…

Optimization and Control · Mathematics 2026-02-02 Daniel Cortild , Lucas Ketels , Juan Peypouquet , Guillaume Garrigos

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization

Stochastic gradient descent (SGD) is a simple and popular method to solve stochastic optimization problems which arise in machine learning. For strongly convex problems, its convergence rate was known to be O(\log(T)/T), by running SGD for…

Machine Learning · Computer Science 2015-03-19 Alexander Rakhlin , Ohad Shamir , Karthik Sridharan

Optimized convergence of stochastic gradient descent by weighted averaging

Under mild assumptions stochastic gradient methods asymptotically achieve an optimal rate of convergence if the arithmetic mean of all iterates is returned as an approximate optimal solution. However, in the absence of stochastic noise, the…

Optimization and Control · Mathematics 2022-10-06 Melinda Hagedorn , Florian Jarre

Stochastic Gradient Descent with Adaptive Data

Stochastic gradient descent (SGD) is a powerful optimization technique that is particularly useful in online learning scenarios. Its convergence analysis is relatively well understood under the assumption that the data samples are…

Machine Learning · Computer Science 2024-10-03 Ethan Che , Jing Dong , Xin T. Tong

The Convergence of Stochastic Gradient Descent in Asynchronous Shared Memory

Stochastic Gradient Descent (SGD) is a fundamental algorithm in machine learning, representing the optimization backbone for training several classic models, from regression to neural networks. Given the recent practical focus on…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-06-25 Dan Alistarh , Christopher De Sa , Nikola Konstantinov

On the Convergence of Stochastic Gradient Descent in Low-precision Number Formats

Deep learning models are dominating almost all artificial intelligence tasks such as vision, text, and speech processing. Stochastic Gradient Descent (SGD) is the main tool for training such models, where the computations are usually…

Machine Learning · Computer Science 2023-01-10 Matteo Cacciola , Antonio Frangioni , Masoud Asgharian , Alireza Ghaffari , Vahid Partovi Nia

A High Probability Analysis of Adaptive SGD with Momentum

Stochastic Gradient Descent (SGD) and its variants are the most used algorithms in machine learning applications. In particular, SGD with adaptive learning rates and momentum is the industry standard to train deep networks. Despite the…

Machine Learning · Statistics 2020-07-29 Xiaoyu Li , Francesco Orabona

Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes

Stochastic Gradient Descent (SGD) is one of the simplest and most popular stochastic optimization methods. While it has already been theoretically studied for decades, the classical analysis usually required non-trivial smoothness…

Machine Learning · Computer Science 2013-01-01 Ohad Shamir , Tong Zhang

Stochastic Gradient Descent with Biased but Consistent Gradient Estimators

Stochastic gradient descent (SGD), which dates back to the 1950s, is one of the most popular and effective approaches for performing stochastic optimization. Research on SGD resurged recently in machine learning for optimizing convex loss…

Machine Learning · Computer Science 2019-12-24 Jie Chen , Ronny Luss

Online stochastic gradient descent on non-convex losses from high-dimensional inference

Stochastic gradient descent (SGD) is a popular algorithm for optimization problems arising in high-dimensional inference tasks. Here one produces an estimator of an unknown parameter from independent samples of data by iteratively…

Machine Learning · Statistics 2023-06-23 Gerard Ben Arous , Reza Gheissari , Aukosh Jagannath

A Variational Analysis of Stochastic Gradient Algorithms

Stochastic Gradient Descent (SGD) is an important algorithm in machine learning. With constant learning rates, it is a stochastic process that, after an initial phase of convergence, generates samples from a stationary distribution. We show…

Machine Learning · Statistics 2017-09-12 Stephan Mandt , Matthew D. Hoffman , David M. Blei

The Optimality of (Accelerated) SGD for High-Dimensional Quadratic Optimization

Stochastic gradient descent (SGD) is a widely used algorithm in machine learning, particularly for neural network training. Recent studies on SGD for canonical quadratic optimization or linear regression show it attains well generalization…

Machine Learning · Computer Science 2024-09-17 Haihan Zhang , Yuanshi Liu , Qianwen Chen , Cong Fang

Statistical Inference for Stochastic Gradient Descent Beyond Finite Variance

Stochastic gradient descent (SGD) is a foundational algorithm for large-scale statistical learning and stochastic optimization. However, statistical inference based on SGD iterates remains challenging when stochastic gradients have infinite…

Machine Learning · Statistics 2026-05-26 Jose Blanchet , Peter Glynn , Wenhao Yang

Almost Sure Convergence Analysis of Differentially Private Stochastic Gradient Methods

Differentially private stochastic gradient descent (DP-SGD) has become the standard algorithm for training machine learning models with rigorous privacy guarantees. Despite its widespread use, the theoretical understanding of its long-run…

Machine Learning · Computer Science 2025-11-21 Amartya Mukherjee , Jun Liu

Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate

Stochastic Gradient Descent (SGD) is a central tool in machine learning. We prove that SGD converges to zero loss, even with a fixed (non-vanishing) learning rate - in the special case of homogeneous linear classifiers with smooth monotone…

Machine Learning · Statistics 2022-04-19 Mor Shpigel Nacson , Nathan Srebro , Daniel Soudry