English
Related papers

Related papers: Why Does Multi-Epoch Training Help?

200 papers

Stochastic gradient descent (SGD) has achieved great success due to its superior performance in both optimization and generalization. Most of existing generalization analyses are made for single-pass SGD, which is a less practical variant…

Machine Learning · Computer Science 2022-03-08 Difan Zou , Jingfeng Wu , Vladimir Braverman , Quanquan Gu , Sham M. Kakade

We study to what extent may stochastic gradient descent (SGD) be understood as a "conventional" learning rule that achieves generalization performance by obtaining a good fit to training data. We consider the fundamental stochastic convex…

Machine Learning · Computer Science 2023-01-13 Tomer Koren , Roi Livni , Yishay Mansour , Uri Sherman

We consider stochastic gradient descent (SGD) for least-squares regression with potentially several passes over the data. While several passes have been widely reported to perform practically better in terms of predictive performance on…

Machine Learning · Computer Science 2018-11-26 Loucas Pillaud-Vivien , Alessandro Rudi , Francis Bach

Stochastic gradient descent (SGD) is a pillar of modern machine learning, serving as the go-to optimization algorithm for a diverse array of problems. While the empirical success of SGD is often attributed to its computational efficiency…

Machine Learning · Statistics 2022-06-16 Courtney Paquette , Elliot Paquette , Ben Adlam , Jeffrey Pennington

Stochastic gradient descent (SGD) with stochastic momentum is popular in nonconvex stochastic optimization and particularly for the training of deep neural networks. In standard SGD, parameters are updated by improving along the path of the…

Machine Learning · Computer Science 2021-06-08 Jun-Kun Wang , Chi-Heng Lin , Jacob Abernethy

Stochastic gradient descent (SGD) is a standard optimization method to minimize a training error with respect to network parameters in modern neural network learning. However, it typically suffers from proliferation of saddle points in the…

Machine Learning · Computer Science 2017-11-23 Haiping Huang , Taro Toyoizumi

We study the out-of-sample performance of multi-pass stochastic gradient descent (SGD) in the fundamental stochastic convex optimization (SCO) model. While one-pass SGD is known to achieve an optimal $\Theta(1/\sqrt{n})$ excess population…

Machine Learning · Computer Science 2025-05-16 Shira Vansover-Hager , Tomer Koren , Roi Livni

Multi-epoch, small-batch, Stochastic Gradient Descent (SGD) has been the method of choice for learning with large over-parameterized models. A popular theory for explaining why SGD works well in practice is that the algorithm has an…

Machine Learning · Computer Science 2021-07-13 Satyen Kale , Ayush Sekhari , Karthik Sridharan

Neural networks are usually trained by some form of stochastic gradient descent (SGD)). A number of strategies are in common use intended to improve SGD optimization, such as learning rate schedules, momentum, and batching. These are…

Neural and Evolutionary Computing · Computer Science 2015-08-13 Thomas M. Breuel

Stagewise training strategy is widely used for learning neural networks, which runs a stochastic algorithm (e.g., SGD) starting with a relatively large step size (aka learning rate) and geometrically decreasing the step size after a number…

Machine Learning · Statistics 2019-02-05 Zhuoning Yuan , Yan Yan , Rong Jin , Tianbao Yang

We study stochastic gradient descent (SGD) for composite optimization problems with $N$ sequential operators subject to perturbations in both the forward and backward passes. Unlike classical analyses that treat gradient noise as additive…

Optimization and Control · Mathematics 2026-02-25 Boao Kong , Hengrui Zhang , Kun Yuan

Understanding the implicit bias of training algorithms is of crucial importance in order to explain the success of overparametrised neural networks. In this paper, we study the dynamics of stochastic gradient descent over diagonal linear…

Machine Learning · Computer Science 2021-12-08 Scott Pesme , Loucas Pillaud-Vivien , Nicolas Flammarion

It is folklore that reusing training data more than once can improve the statistical efficiency of gradient-based learning. However, beyond linear regression, the theoretical advantage of full-batch gradient descent (GD, which always reuses…

Machine Learning · Statistics 2026-02-03 Filip Kovačević , Hong Chang Ji , Denny Wu , Mahdi Soltanolkotabi , Marco Mondelli

The performance of stochastic gradient descent (SGD) depends critically on how learning rates are tuned and decreased over time. We propose a method to automatically adjust multiple learning rates so as to minimize the expected error at any…

Machine Learning · Statistics 2013-02-19 Tom Schaul , Sixin Zhang , Yann LeCun

In this paper we analyze the behaviour of the stochastic gradient descent (SGD), a widely used method in supervised learning for optimizing neural network weights via a minimization of non-convex loss functions. Since the pioneering work of…

Machine Learning · Computer Science 2025-05-13 Davide Barbieri , Matteo Bonforte , Peio Ibarrondo

For large scale learning problems, it is desirable if we can obtain the optimal model parameters by going through the data in only one pass. Polyak and Juditsky (1992) showed that asymptotically the test performance of the simple average of…

Machine Learning · Computer Science 2011-12-23 Wei Xu

State-of-the-art training algorithms for deep learning models are based on stochastic gradient descent (SGD). Recently, many variations have been explored: perturbing parameters for better accuracy (such as in Extragradient), limiting SGD…

Machine Learning · Computer Science 2022-03-23 Amirkeivan Mohtashami , Martin Jaggi , Sebastian U. Stich

In this paper, we consider a general stochastic optimization problem which is often at the core of supervised learning, such as deep learning and linear classification. We consider a standard stochastic gradient descent (SGD) method with a…

Machine Learning · Statistics 2018-12-27 Lam M. Nguyen , Nam H. Nguyen , Dzung T. Phan , Jayant R. Kalagnanam , Katya Scheinberg

We study the performance of stochastic gradient descent (SGD) on smooth and strongly-convex finite-sum optimization problems. In contrast to the majority of existing theoretical works, which assume that individual functions are sampled with…

Machine Learning · Computer Science 2021-06-03 Itay Safran , Ohad Shamir

The early phase of training of deep neural networks is critical for their final performance. In this work, we study how the hyperparameters of stochastic gradient descent (SGD) used in the early phase of training affect the rest of the…

Machine Learning · Computer Science 2020-02-25 Stanislaw Jastrzebski , Maciej Szymczak , Stanislav Fort , Devansh Arpit , Jacek Tabor , Kyunghyun Cho , Krzysztof Geras
‹ Prev 1 2 3 10 Next ›