Related papers: Weak error analysis for stochastic gradient descen…

Lower error bounds for the stochastic gradient descent optimization algorithm: Sharp convergence rates for slowly and fast decaying learning rates

The stochastic gradient descent (SGD) optimization algorithm plays a central role in a series of machine learning applications. The scientific literature provides a vast amount of upper error bounds for the SGD method. Much less attention…

Numerical Analysis · Mathematics 2020-10-05 Arnulf Jentzen , Philippe von Wurstemberger

Strong error analysis for stochastic gradient descent optimization algorithms

Stochastic gradient descent (SGD) optimization algorithms are key ingredients in a series of machine learning applications. In this article we perform a rigorous strong error analysis for SGD optimization algorithms. In particular, we prove…

Numerical Analysis · Mathematics 2020-10-05 Arnulf Jentzen , Benno Kuckuck , Ariel Neufeld , Philippe von Wurstemberger

Stochastic Gradient Descent Revisited

Stochastic gradient descent (SGD) has been a go-to algorithm for nonconvex stochastic optimization problems arising in machine learning. Its theory however often requires a strong framework to guarantee convergence properties. We hereby…

Optimization and Control · Mathematics 2025-03-11 Azar Louzi

When Does Stochastic Gradient Algorithm Work Well?

In this paper, we consider a general stochastic optimization problem which is often at the core of supervised learning, such as deep learning and linear classification. We consider a standard stochastic gradient descent (SGD) method with a…

Machine Learning · Statistics 2018-12-27 Lam M. Nguyen , Nam H. Nguyen , Dzung T. Phan , Jayant R. Kalagnanam , Katya Scheinberg

On the Convergence of Stochastic Gradient Descent in Low-precision Number Formats

Deep learning models are dominating almost all artificial intelligence tasks such as vision, text, and speech processing. Stochastic Gradient Descent (SGD) is the main tool for training such models, where the computations are usually…

Machine Learning · Computer Science 2023-01-10 Matteo Cacciola , Antonio Frangioni , Masoud Asgharian , Alireza Ghaffari , Vahid Partovi Nia

Statistical Inference for Model Parameters in Stochastic Gradient Descent

The stochastic gradient descent (SGD) algorithm has been widely used in statistical estimation for large-scale data due to its computational and memory efficiency. While most existing works focus on the convergence of the objective function…

Machine Learning · Statistics 2023-11-02 Xi Chen , Jason D. Lee , Xin T. Tong , Yichen Zhang

Eigen-componentwise convergence of SGD on quadratic programming

Stochastic gradient descent (SGD) is a workhorse algorithm for solving large-scale optimization problems in data science and machine learning. Understanding the convergence of SGD is hence of fundamental importance. In this work we examine…

Numerical Analysis · Mathematics 2024-12-11 Lehan Chen , Yuji Nakatsukasa

Error Lower Bounds of Constant Step-size Stochastic Gradient Descent

Stochastic Gradient Descent (SGD) plays a central role in modern machine learning. While there is extensive work on providing error upper bound for SGD, not much is known about SGD error lower bound. In this paper, we study the convergence…

Optimization and Control · Mathematics 2019-10-21 Zhiyan Ding , Yiding Chen , Qin Li , Xiaojin Zhu

Stochastic gradient descent with noise of machine learning type. Part I: Discrete time analysis

Stochastic gradient descent (SGD) is one of the most popular algorithms in modern machine learning. The noise encountered in these applications is different from that in many theoretical analyses of stochastic gradient algorithms. In this…

Machine Learning · Statistics 2021-09-16 Stephan Wojtowytsch

The Optimality of (Accelerated) SGD for High-Dimensional Quadratic Optimization

Stochastic gradient descent (SGD) is a widely used algorithm in machine learning, particularly for neural network training. Recent studies on SGD for canonical quadratic optimization or linear regression show it attains well generalization…

Machine Learning · Computer Science 2024-09-17 Haihan Zhang , Yuanshi Liu , Qianwen Chen , Cong Fang

Improving the convergence of SGD through adaptive batch sizes

Mini-batch stochastic gradient descent (SGD) and variants thereof approximate the objective function's gradient with a small number of training examples, aka the batch size. Small batch sizes require little computation for each model update…

Machine Learning · Computer Science 2023-09-28 Scott Sievert , Shrey Shah

Online stochastic gradient descent on non-convex losses from high-dimensional inference

Stochastic gradient descent (SGD) is a popular algorithm for optimization problems arising in high-dimensional inference tasks. Here one produces an estimator of an unknown parameter from independent samples of data by iteratively…

Machine Learning · Statistics 2023-06-23 Gerard Ben Arous , Reza Gheissari , Aukosh Jagannath

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization

Stochastic gradient descent (SGD) is a simple and popular method to solve stochastic optimization problems which arise in machine learning. For strongly convex problems, its convergence rate was known to be O(\log(T)/T), by running SGD for…

Machine Learning · Computer Science 2015-03-19 Alexander Rakhlin , Ohad Shamir , Karthik Sridharan

On the Convergence of SGD Training of Neural Networks

Neural networks are usually trained by some form of stochastic gradient descent (SGD)). A number of strategies are in common use intended to improve SGD optimization, such as learning rate schedules, momentum, and batching. These are…

Neural and Evolutionary Computing · Computer Science 2015-08-13 Thomas M. Breuel

Stochastic Gradient Descent in the Viewpoint of Graduated Optimization

Stochastic gradient descent (SGD) method is popular for solving non-convex optimization problems in machine learning. This work investigates SGD from a viewpoint of graduated optimization, which is a widely applied approach for non-convex…

Optimization and Control · Mathematics 2023-08-15 Da Li , Jingjing Wu , Qingrun Zhang

Analysis of Biased Stochastic Gradient Descent Using Sequential Semidefinite Programs

We present a convergence rate analysis for biased stochastic gradient descent (SGD), where individual gradient updates are corrupted by computation errors. We develop stochastic quadratic constraints to formulate a small linear matrix…

Optimization and Control · Mathematics 2020-03-31 Bin Hu , Peter Seiler , Laurent Lessard

The Impact of the Mini-batch Size on the Variance of Gradients in Stochastic Gradient Descent

The mini-batch stochastic gradient descent (SGD) algorithm is widely used in training machine learning models, in particular deep learning models. We study SGD dynamics under linear regression and two-layer linear networks, with an easy…

Optimization and Control · Mathematics 2020-04-29 Xin Qian , Diego Klabjan

Differential Equations for Modeling Asynchronous Algorithms

Asynchronous stochastic gradient descent (ASGD) is a popular parallel optimization algorithm in machine learning. Most theoretical analysis on ASGD take a discrete view and prove upper bounds for their convergence rates. However, the…

Machine Learning · Statistics 2018-05-09 Li He , Qi Meng , Wei Chen , Zhi-Ming Ma , Tie-Yan Liu

Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite-Sum Structure

Stochastic optimization algorithms with variance reduction have proven successful for minimizing large finite sums of functions. Unfortunately, these techniques are unable to deal with stochastic perturbations of input data, induced for…

Machine Learning · Statistics 2017-11-16 Alberto Bietti , Julien Mairal

First and Second Order Approximations to Stochastic Gradient Descent Methods with Momentum Terms

Stochastic Gradient Descent (SGD) methods see many uses in optimization problems. Modifications to the algorithm, such as momentum-based SGD methods have been known to produce better results in certain cases. Much of this, however, is due…

Machine Learning · Computer Science 2025-04-22 Eric Lu