English
Related papers

Related papers: Graph-Dependent Implicit Regularisation for Distri…

200 papers

We establish a data-dependent notion of algorithmic stability for Stochastic Gradient Descent (SGD), and employ it to develop novel generalization bounds. This is in contrast to previous distribution-free algorithmic stability results for…

Machine Learning · Computer Science 2018-02-19 Ilja Kuzborskij , Christoph H. Lampert

This paper presents a new generalization error analysis for Decentralized Stochastic Gradient Descent (D-SGD) based on algorithmic stability. The obtained results overhaul a series of recent works that suggested an increased instability due…

Machine Learning · Computer Science 2024-06-14 Batiste Le Bars , Aurélien Bellet , Marc Tommasi , Kevin Scaman , Giovanni Neglia

Stochastic gradient descent (SGD) exhibits strong algorithmic regularization effects in practice, which has been hypothesized to play an important role in the generalization of modern machine learning approaches. In this work, we seek to…

Machine Learning · Computer Science 2022-07-12 Difan Zou , Jingfeng Wu , Vladimir Braverman , Quanquan Gu , Dean P. Foster , Sham M. Kakade

Stochastic gradient descent (SGD) is a pillar of modern machine learning, serving as the go-to optimization algorithm for a diverse array of problems. While the empirical success of SGD is often attributed to its computational efficiency…

Machine Learning · Statistics 2022-06-16 Courtney Paquette , Elliot Paquette , Ben Adlam , Jeffrey Pennington

We study to what extent may stochastic gradient descent (SGD) be understood as a "conventional" learning rule that achieves generalization performance by obtaining a good fit to training data. We consider the fundamental stochastic convex…

Machine Learning · Computer Science 2023-01-13 Tomer Koren , Roi Livni , Yishay Mansour , Uri Sherman

The notion of implicit bias, or implicit regularization, has been suggested as a means to explain the surprising generalization ability of modern-days overparameterized learning algorithms. This notion refers to the tendency of the…

Machine Learning · Computer Science 2020-12-23 Assaf Dauber , Meir Feder , Tomer Koren , Roi Livni

We provide sharp path-dependent generalization and excess risk guarantees for the full-batch Gradient Descent (GD) algorithm on smooth losses (possibly non-Lipschitz, possibly nonconvex). At the heart of our analysis is an upper bound on…

Machine Learning · Statistics 2023-02-13 Konstantinos E. Nikolakakis , Farzin Haddadpour , Amin Karbasi , Dionysios S. Kalogerias

Deep neural networks with remarkably strong generalization performances are usually over-parameterized. Despite explicit regularization strategies are used for practitioners to avoid over-fitting, the impacts are often small. Some…

Computation and Language · Computer Science 2018-11-05 Deren Lei , Zichen Sun , Yijun Xiao , William Yang Wang

Stochastic gradient descent (SGD) is a popular and efficient method with wide applications in training deep neural nets and other nonconvex models. While the behavior of SGD is well understood in the convex learning setting, the existing…

Machine Learning · Computer Science 2019-12-16 Yunwen Lei , Ting Hu , Guiying Li , Ke Tang

Stochastic Gradient Descent (SGD) based methods have been widely used for training large-scale machine learning models that also generalize well in practice. Several explanations have been offered for this generalization performance, a…

Machine Learning · Computer Science 2021-02-11 Yikai Zhang , Wenjia Zhang , Sammy Bald , Vamsi Pingali , Chao Chen , Mayank Goswami

Distributed Optimization is an increasingly important subject area with the rise of multi-agent control and optimization. We consider a decentralized stochastic optimization problem where the agents on a graph aim to asynchronously optimize…

Optimization and Control · Mathematics 2021-10-22 Vyacheslav Kungurtsev , Mahdi Morafah , Tara Javidi , Gesualdo Scutari

Stochastic Gradient Descent (SGD) is one of the simplest and most popular stochastic optimization methods. While it has already been theoretically studied for decades, the classical analysis usually required non-trivial smoothness…

Machine Learning · Computer Science 2013-01-01 Ohad Shamir , Tong Zhang

We develop a distributed stochastic gradient descent algorithm for solving non-convex optimization problems under the assumption that the local objective functions are twice continuously differentiable with Lipschitz continuous gradients…

Optimization and Control · Mathematics 2019-08-20 Jemin George , Tao Yang , He Bai , Prudhvi Gurram

Multi-epoch, small-batch, Stochastic Gradient Descent (SGD) has been the method of choice for learning with large over-parameterized models. A popular theory for explaining why SGD works well in practice is that the algorithm has an…

Machine Learning · Computer Science 2021-07-13 Satyen Kale , Ayush Sekhari , Karthik Sridharan

The stability and generalization of stochastic gradient-based methods provide valuable insights into understanding the algorithmic performance of machine learning models. As the main workhorse for deep learning, stochastic gradient descent…

Machine Learning · Statistics 2021-02-24 Tao Sun , Dongsheng Li , Bao Wang

Recent years have seen a flurry of activities in designing provably efficient nonconvex procedures for solving statistical estimation problems. Due to the highly nonconvex nature of the empirical loss, state-of-the-art procedures often…

Machine Learning · Computer Science 2020-06-09 Cong Ma , Kaizheng Wang , Yuejie Chi , Yuxin Chen

Stochastic gradient descent (SGD), which dates back to the 1950s, is one of the most popular and effective approaches for performing stochastic optimization. Research on SGD resurged recently in machine learning for optimizing convex loss…

Machine Learning · Computer Science 2019-12-24 Jie Chen , Ronny Luss

Recently there are a considerable amount of work devoted to the study of the algorithmic stability and generalization for stochastic gradient descent (SGD). However, the existing stability analysis requires to impose restrictive assumptions…

Machine Learning · Computer Science 2020-06-16 Yunwen Lei , Yiming Ying

Classical assumptions like strong convexity and Lipschitz smoothness often fail to capture the nature of deep learning optimization problems, which are typically non-convex and non-smooth, making traditional analyses less applicable. This…

Machine Learning · Computer Science 2025-05-01 Binchuan Qi , Wei Gong , Li Li

Machine learning models trained with \emph{stochastic} gradient descent (SGD) can generalize better than those trained with deterministic gradient descent (GD). In this work, we study SGD's impact on generalization through the lens of the…

Machine Learning · Computer Science 2025-12-09 Hongjian Lan , Yucong Liu , Florian Schäfer
‹ Prev 1 2 3 10 Next ›