English
Related papers

Related papers: SGD Implicitly Regularizes Generalization Error

200 papers

Machine learning models trained with \emph{stochastic} gradient descent (SGD) can generalize better than those trained with deterministic gradient descent (GD). In this work, we study SGD's impact on generalization through the lens of the…

Machine Learning · Computer Science 2025-12-09 Hongjian Lan , Yucong Liu , Florian Schäfer

For infinitesimal learning rates, stochastic gradient descent (SGD) follows the path of gradient flow on the full batch loss function. However moderately large learning rates can achieve higher test accuracies, and this generalization…

Machine Learning · Computer Science 2021-01-29 Samuel L. Smith , Benoit Dherin , David G. T. Barrett , Soham De

We give a new separation result between the generalization performance of stochastic gradient descent (SGD) and of full-batch gradient descent (GD) in the fundamental stochastic convex optimization model. While for SGD it is well-known that…

Machine Learning · Computer Science 2021-07-01 Idan Amir , Tomer Koren , Roi Livni

The success of deep learning has led to a rising interest in the generalization property of the stochastic gradient descent (SGD) method, and stability is one popular approach to study it. Existing works based on stability have studied…

Machine Learning · Statistics 2019-03-08 Yi Zhou , Yingbin Liang , Huishuai Zhang

We study to what extent may stochastic gradient descent (SGD) be understood as a "conventional" learning rule that achieves generalization performance by obtaining a good fit to training data. We consider the fundamental stochastic convex…

Machine Learning · Computer Science 2023-01-13 Tomer Koren , Roi Livni , Yishay Mansour , Uri Sherman

We study the generalization properties of the popular stochastic optimization method known as stochastic gradient descent (SGD) for optimizing general non-convex loss functions. Our main contribution is providing upper bounds on the…

Machine Learning · Computer Science 2021-08-17 Gergely Neu , Gintare Karolina Dziugaite , Mahdi Haghifam , Daniel M. Roy

We empirically show that the test error of deep networks can be estimated by simply training the same architecture on the same training set but with a different run of Stochastic Gradient Descent (SGD), and measuring the disagreement rate…

Machine Learning · Computer Science 2022-05-17 Yiding Jiang , Vaishnavh Nagarajan , Christina Baek , J. Zico Kolter

A number of competing hypotheses have been proposed to explain why small-batch Stochastic Gradient Descent (SGD)leads to improved generalization over the full-batch regime, with recent work crediting the implicit regularization of various…

Machine Learning · Computer Science 2022-11-30 Zachary Novack , Simran Kaur , Tanya Marwah , Saurabh Garg , Zachary C. Lipton

This work studies the generalization error of gradient methods. More specifically, we focus on how training steps $T$ and step-size $\eta$ might affect generalization in smooth stochastic convex optimization (SCO) problems. We first provide…

Machine Learning · Computer Science 2023-05-11 Peiyuan Zhang , Jiaye Teng , Jingzhao Zhang

In this paper we investigate the generalization error of gradient descent (GD) applied to an $\ell_2$-regularized OLS objective function in the linear model. Based on our analysis we develop new methodology for computationally tractable and…

Statistics Theory · Mathematics 2026-01-27 Thomas Stark , Lukas Steinberger

The generalization of machine learning models has a complex dependence on the data, model and learning algorithm. We study train and test performance, as well as the generalization gap given by the mean of their difference over different…

Machine Learning · Statistics 2022-06-29 Carlos A. Gomez-Uribe

One classical canon of statistics is that large models are prone to overfitting, and model selection procedures are necessary for high dimensional data. However, many overparameterized models, such as neural networks, perform very well in…

Machine Learning · Statistics 2021-01-05 Xi Chen , Qiang Liu , Xin T. Tong

This paper studies the generalization performance of iterates obtained by Gradient Descent (GD), Stochastic Gradient Descent (SGD) and their proximal variants in high-dimensional robust regression problems. The number of features is…

Statistics Theory · Mathematics 2024-11-05 Kai Tan , Pierre C. Bellec

Deep neural networks with remarkably strong generalization performances are usually over-parameterized. Despite explicit regularization strategies are used for practitioners to avoid over-fitting, the impacts are often small. Some…

Computation and Language · Computer Science 2018-11-05 Deren Lei , Zichen Sun , Yijun Xiao , William Yang Wang

This paper presents a new generalization error analysis for Decentralized Stochastic Gradient Descent (D-SGD) based on algorithmic stability. The obtained results overhaul a series of recent works that suggested an increased instability due…

Machine Learning · Computer Science 2024-06-14 Batiste Le Bars , Aurélien Bellet , Marc Tommasi , Kevin Scaman , Giovanni Neglia

We consider the generalization error associated with stochastic gradient descent on a smooth convex function over a compact set. We show the first bound on the generalization error that vanishes when the number of iterations $T$ and the…

Machine Learning · Computer Science 2024-04-16 Julien Hendrickx , Alex Olshevsky

Stochastic gradient descent (SGD) is a foundational algorithm for large-scale statistical learning and stochastic optimization. However, statistical inference based on SGD iterates remains challenging when stochastic gradients have infinite…

Machine Learning · Statistics 2026-05-26 Jose Blanchet , Peter Glynn , Wenhao Yang

Mini-batch stochastic gradient descent (SGD) and variants thereof approximate the objective function's gradient with a small number of training examples, aka the batch size. Small batch sizes require little computation for each model update…

Machine Learning · Computer Science 2023-09-28 Scott Sievert , Shrey Shah

Regularization for optimization is a crucial technique to avoid overfitting in machine learning. In order to obtain the best performance, we usually train a model by tuning the regularization parameters. It becomes costly, however, when a…

Machine Learning · Computer Science 2020-08-18 Jingfeng Wu , Vladimir Braverman , Lin F. Yang

In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over diagonal linear networks. We prove the convergence of GD and…

Machine Learning · Computer Science 2023-10-26 Mathieu Even , Scott Pesme , Suriya Gunasekar , Nicolas Flammarion
‹ Prev 1 2 3 10 Next ›