Related papers: Graph-Dependent Implicit Regularisation for Distri…

Data-Dependent Stability of Stochastic Gradient Descent

We establish a data-dependent notion of algorithmic stability for Stochastic Gradient Descent (SGD), and employ it to develop novel generalization bounds. This is in contrast to previous distribution-free algorithmic stability results for…

Machine Learning · Computer Science 2018-02-19 Ilja Kuzborskij , Christoph H. Lampert

Improved Stability and Generalization Guarantees of the Decentralized SGD Algorithm

This paper presents a new generalization error analysis for Decentralized Stochastic Gradient Descent (D-SGD) based on algorithmic stability. The obtained results overhaul a series of recent works that suggested an increased instability due…

Machine Learning · Computer Science 2024-06-14 Batiste Le Bars , Aurélien Bellet , Marc Tommasi , Kevin Scaman , Giovanni Neglia

The Benefits of Implicit Regularization from SGD in Least Squares Problems

Stochastic gradient descent (SGD) exhibits strong algorithmic regularization effects in practice, which has been hypothesized to play an important role in the generalization of modern machine learning approaches. In this work, we seek to…

Machine Learning · Computer Science 2022-07-12 Difan Zou , Jingfeng Wu , Vladimir Braverman , Quanquan Gu , Dean P. Foster , Sham M. Kakade

Implicit Regularization or Implicit Conditioning? Exact Risk Trajectories of SGD in High Dimensions

Stochastic gradient descent (SGD) is a pillar of modern machine learning, serving as the go-to optimization algorithm for a diverse array of problems. While the empirical success of SGD is often attributed to its computational efficiency…

Machine Learning · Statistics 2022-06-16 Courtney Paquette , Elliot Paquette , Ben Adlam , Jeffrey Pennington

Benign Underfitting of Stochastic Gradient Descent

We study to what extent may stochastic gradient descent (SGD) be understood as a "conventional" learning rule that achieves generalization performance by obtaining a good fit to training data. We consider the fundamental stochastic convex…

Machine Learning · Computer Science 2023-01-13 Tomer Koren , Roi Livni , Yishay Mansour , Uri Sherman

Can Implicit Bias Explain Generalization? Stochastic Convex Optimization as a Case Study

The notion of implicit bias, or implicit regularization, has been suggested as a means to explain the surprising generalization ability of modern-days overparameterized learning algorithms. This notion refers to the tendency of the…

Machine Learning · Computer Science 2020-12-23 Assaf Dauber , Meir Feder , Tomer Koren , Roi Livni

Beyond Lipschitz: Sharp Generalization and Excess Risk Bounds for Full-Batch GD

We provide sharp path-dependent generalization and excess risk guarantees for the full-batch Gradient Descent (GD) algorithm on smooth losses (possibly non-Lipschitz, possibly nonconvex). At the heart of our analysis is an upper bound on…

Machine Learning · Statistics 2023-02-13 Konstantinos E. Nikolakakis , Farzin Haddadpour , Amin Karbasi , Dionysios S. Kalogerias

Implicit Regularization of Stochastic Gradient Descent in Natural Language Processing: Observations and Implications

Deep neural networks with remarkably strong generalization performances are usually over-parameterized. Despite explicit regularization strategies are used for practitioners to avoid over-fitting, the impacts are often small. Some…

Computation and Language · Computer Science 2018-11-05 Deren Lei , Zichen Sun , Yijun Xiao , William Yang Wang

Stochastic Gradient Descent for Nonconvex Learning without Bounded Gradient Assumptions

Stochastic gradient descent (SGD) is a popular and efficient method with wide applications in training deep neural nets and other nonconvex models. While the behavior of SGD is well understood in the convex learning setting, the existing…

Machine Learning · Computer Science 2019-12-16 Yunwen Lei , Ting Hu , Guiying Li , Ke Tang

Stability of SGD: Tightness Analysis and Improved Bounds

Stochastic Gradient Descent (SGD) based methods have been widely used for training large-scale machine learning models that also generalize well in practice. Several explanations have been offered for this generalization performance, a…

Machine Learning · Computer Science 2021-02-11 Yikai Zhang , Wenjia Zhang , Sammy Bald , Vamsi Pingali , Chao Chen , Mayank Goswami

Decentralized Asynchronous Non-convex Stochastic Optimization on Directed Graphs

Distributed Optimization is an increasingly important subject area with the rise of multi-agent control and optimization. We consider a decentralized stochastic optimization problem where the agents on a graph aim to asynchronously optimize…

Optimization and Control · Mathematics 2021-10-22 Vyacheslav Kungurtsev , Mahdi Morafah , Tara Javidi , Gesualdo Scutari

Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes

Stochastic Gradient Descent (SGD) is one of the simplest and most popular stochastic optimization methods. While it has already been theoretically studied for decades, the classical analysis usually required non-trivial smoothness…

Machine Learning · Computer Science 2013-01-01 Ohad Shamir , Tong Zhang

Distributed Stochastic Gradient Method for Non-Convex Problems with Applications in Supervised Learning

We develop a distributed stochastic gradient descent algorithm for solving non-convex optimization problems under the assumption that the local objective functions are twice continuously differentiable with Lipschitz continuous gradients…

Optimization and Control · Mathematics 2019-08-20 Jemin George , Tao Yang , He Bai , Prudhvi Gurram

SGD: The Role of Implicit Regularization, Batch-size and Multiple-epochs

Multi-epoch, small-batch, Stochastic Gradient Descent (SGD) has been the method of choice for learning with large over-parameterized models. A popular theory for explaining why SGD works well in practice is that the algorithm has an…

Machine Learning · Computer Science 2021-07-13 Satyen Kale , Ayush Sekhari , Karthik Sridharan

Stability and Generalization of the Decentralized Stochastic Gradient Descent

The stability and generalization of stochastic gradient-based methods provide valuable insights into understanding the algorithmic performance of machine learning models. As the main workhorse for deep learning, stochastic gradient descent…

Machine Learning · Statistics 2021-02-24 Tao Sun , Dongsheng Li , Bao Wang

Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution

Recent years have seen a flurry of activities in designing provably efficient nonconvex procedures for solving statistical estimation problems. Due to the highly nonconvex nature of the empirical loss, state-of-the-art procedures often…

Machine Learning · Computer Science 2020-06-09 Cong Ma , Kaizheng Wang , Yuejie Chi , Yuxin Chen

Stochastic Gradient Descent with Biased but Consistent Gradient Estimators

Stochastic gradient descent (SGD), which dates back to the 1950s, is one of the most popular and effective approaches for performing stochastic optimization. Research on SGD resurged recently in machine learning for optimizing convex loss…

Machine Learning · Computer Science 2019-12-24 Jie Chen , Ronny Luss

Fine-Grained Analysis of Stability and Generalization for Stochastic Gradient Descent

Recently there are a considerable amount of work devoted to the study of the algorithmic stability and generalization for stochastic gradient descent (SGD). However, the existing stability analysis requires to impose restrictive assumptions…

Machine Learning · Computer Science 2020-06-16 Yunwen Lei , Yiming Ying

Extended convexity and smoothness and their applications in deep learning

Classical assumptions like strong convexity and Lipschitz smoothness often fail to capture the nature of deep learning optimization problems, which are typically non-convex and non-smooth, making traditional analyses less applicable. This…

Machine Learning · Computer Science 2025-05-01 Binchuan Qi , Wei Gong , Li Li

A Bootstrap Perspective on Stochastic Gradient Descent

Machine learning models trained with \emph{stochastic} gradient descent (SGD) can generalize better than those trained with deterministic gradient descent (GD). In this work, we study SGD's impact on generalization through the lens of the…

Machine Learning · Computer Science 2025-12-09 Hongjian Lan , Yucong Liu , Florian Schäfer