Related papers: SGD Implicitly Regularizes Generalization Error

A Bootstrap Perspective on Stochastic Gradient Descent

Machine learning models trained with \emph{stochastic} gradient descent (SGD) can generalize better than those trained with deterministic gradient descent (GD). In this work, we study SGD's impact on generalization through the lens of the…

Machine Learning · Computer Science 2025-12-09 Hongjian Lan , Yucong Liu , Florian Schäfer

On the Origin of Implicit Regularization in Stochastic Gradient Descent

For infinitesimal learning rates, stochastic gradient descent (SGD) follows the path of gradient flow on the full batch loss function. However moderately large learning rates can achieve higher test accuracies, and this generalization…

Machine Learning · Computer Science 2021-01-29 Samuel L. Smith , Benoit Dherin , David G. T. Barrett , Soham De

SGD Generalizes Better Than GD (And Regularization Doesn't Help)

We give a new separation result between the generalization performance of stochastic gradient descent (SGD) and of full-batch gradient descent (GD) in the fundamental stochastic convex optimization model. While for SGD it is well-known that…

Machine Learning · Computer Science 2021-07-01 Idan Amir , Tomer Koren , Roi Livni

Generalization Error Bounds with Probabilistic Guarantee for SGD in Nonconvex Optimization

The success of deep learning has led to a rising interest in the generalization property of the stochastic gradient descent (SGD) method, and stability is one popular approach to study it. Existing works based on stability have studied…

Machine Learning · Statistics 2019-03-08 Yi Zhou , Yingbin Liang , Huishuai Zhang

Benign Underfitting of Stochastic Gradient Descent

We study to what extent may stochastic gradient descent (SGD) be understood as a "conventional" learning rule that achieves generalization performance by obtaining a good fit to training data. We consider the fundamental stochastic convex…

Machine Learning · Computer Science 2023-01-13 Tomer Koren , Roi Livni , Yishay Mansour , Uri Sherman

Information-Theoretic Generalization Bounds for Stochastic Gradient Descent

We study the generalization properties of the popular stochastic optimization method known as stochastic gradient descent (SGD) for optimizing general non-convex loss functions. Our main contribution is providing upper bounds on the…

Machine Learning · Computer Science 2021-08-17 Gergely Neu , Gintare Karolina Dziugaite , Mahdi Haghifam , Daniel M. Roy

Assessing Generalization of SGD via Disagreement

We empirically show that the test error of deep networks can be estimated by simply training the same architecture on the same training set but with a different run of Stochastic Gradient Descent (SGD), and measuring the disagreement rate…

Machine Learning · Computer Science 2022-05-17 Yiding Jiang , Vaishnavh Nagarajan , Christina Baek , J. Zico Kolter

Disentangling the Mechanisms Behind Implicit Regularization in SGD

A number of competing hypotheses have been proposed to explain why small-batch Stochastic Gradient Descent (SGD)leads to improved generalization over the full-batch regime, with recent work crediting the implicit regularization of various…

Machine Learning · Computer Science 2022-11-30 Zachary Novack , Simran Kaur , Tanya Marwah , Saurabh Garg , Zachary C. Lipton

Lower Generalization Bounds for GD and SGD in Smooth Stochastic Convex Optimization

This work studies the generalization error of gradient methods. More specifically, we focus on how training steps $T$ and step-size $\eta$ might affect generalization in smooth stochastic convex optimization (SCO) problems. We first provide…

Machine Learning · Computer Science 2023-05-11 Peiyuan Zhang , Jiaye Teng , Jingzhao Zhang

Implicit vs. explicit regularization for high-dimensional gradient descent

In this paper we investigate the generalization error of gradient descent (GD) applied to an $\ell_2$-regularized OLS objective function in the linear model. Based on our analysis we develop new methodology for computationally tractable and…

Statistics Theory · Mathematics 2026-01-27 Thomas Stark , Lukas Steinberger

Studying Generalization Through Data Averaging

The generalization of machine learning models has a complex dependence on the data, model and learning algorithm. We study train and test performance, as well as the generalization gap given by the mean of their difference over different…

Machine Learning · Statistics 2022-06-29 Carlos A. Gomez-Uribe

Dimension Independent Generalization Error by Stochastic Gradient Descent

One classical canon of statistics is that large models are prone to overfitting, and model selection procedures are necessary for high dimensional data. However, many overparameterized models, such as neural networks, perform very well in…

Machine Learning · Statistics 2021-01-05 Xi Chen , Qiang Liu , Xin T. Tong

Estimating Generalization Performance Along the Trajectory of Proximal SGD in Robust Regression

This paper studies the generalization performance of iterates obtained by Gradient Descent (GD), Stochastic Gradient Descent (SGD) and their proximal variants in high-dimensional robust regression problems. The number of features is…

Statistics Theory · Mathematics 2024-11-05 Kai Tan , Pierre C. Bellec

Implicit Regularization of Stochastic Gradient Descent in Natural Language Processing: Observations and Implications

Deep neural networks with remarkably strong generalization performances are usually over-parameterized. Despite explicit regularization strategies are used for practitioners to avoid over-fitting, the impacts are often small. Some…

Computation and Language · Computer Science 2018-11-05 Deren Lei , Zichen Sun , Yijun Xiao , William Yang Wang

Improved Stability and Generalization Guarantees of the Decentralized SGD Algorithm

This paper presents a new generalization error analysis for Decentralized Stochastic Gradient Descent (D-SGD) based on algorithmic stability. The obtained results overhaul a series of recent works that suggested an increased instability due…

Machine Learning · Computer Science 2024-06-14 Batiste Le Bars , Aurélien Bellet , Marc Tommasi , Kevin Scaman , Giovanni Neglia

Convex SGD: Generalization Without Early Stopping

We consider the generalization error associated with stochastic gradient descent on a smooth convex function over a compact set. We show the first bound on the generalization error that vanishes when the number of iterations $T$ and the…

Machine Learning · Computer Science 2024-04-16 Julien Hendrickx , Alex Olshevsky

Statistical Inference for Stochastic Gradient Descent Beyond Finite Variance

Stochastic gradient descent (SGD) is a foundational algorithm for large-scale statistical learning and stochastic optimization. However, statistical inference based on SGD iterates remains challenging when stochastic gradients have infinite…

Machine Learning · Statistics 2026-05-26 Jose Blanchet , Peter Glynn , Wenhao Yang

Improving the convergence of SGD through adaptive batch sizes

Mini-batch stochastic gradient descent (SGD) and variants thereof approximate the objective function's gradient with a small number of training examples, aka the batch size. Small batch sizes require little computation for each model update…

Machine Learning · Computer Science 2023-09-28 Scott Sievert , Shrey Shah

Obtaining Adjustable Regularization for Free via Iterate Averaging

Regularization for optimization is a crucial technique to avoid overfitting in machine learning. In order to obtain the best performance, we usually train a model by tuning the regularization parameters. It becomes costly, however, when a…

Machine Learning · Computer Science 2020-08-18 Jingfeng Wu , Vladimir Braverman , Lin F. Yang

(S)GD over Diagonal Linear Networks: Implicit Regularisation, Large Stepsizes and Edge of Stability

In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over diagonal linear networks. We prove the convergence of GD and…

Machine Learning · Computer Science 2023-10-26 Mathieu Even , Scott Pesme , Suriya Gunasekar , Nicolas Flammarion