English
Related papers

Related papers: Estimating Implicit Regularization in Deep Learnin…

200 papers

Works on implicit regularization have studied gradient trajectories during the optimization process to explain why deep networks favor certain kinds of solutions over others. In deep linear networks, it has been shown that gradient descent…

Machine Learning · Computer Science 2023-06-02 Dan Zhao

Efforts to understand the generalization mystery in deep learning have led to the belief that gradient-based optimization induces a form of implicit regularization, a bias towards models of low "complexity." We study the implicit…

Machine Learning · Computer Science 2019-10-29 Sanjeev Arora , Nadav Cohen , Wei Hu , Yuping Luo

Gradient descent can be surprisingly good at optimizing deep neural networks without overfitting and without explicit regularization. We find that the discrete steps of gradient descent implicitly regularize models by penalizing gradient…

Machine Learning · Computer Science 2022-07-20 David G. T. Barrett , Benoit Dherin

It is important to understand how dropout, a popular regularization method, aids in achieving a good generalization solution during neural network training. In this work, we present a theoretical derivation of an implicit regularization of…

Machine Learning · Computer Science 2023-04-11 Zhongwang Zhang , Zhi-Qin John Xu

Dropout is a widely-used regularization technique, often required to obtain state-of-the-art for a number of architectures. This work demonstrates that dropout introduces two distinct but entangled regularization effects: an explicit effect…

Machine Learning · Computer Science 2020-10-16 Colin Wei , Sham Kakade , Tengyu Ma

Gradient regularization (GR) is a method that penalizes the gradient norm of the training loss during training. While some studies have reported that GR can improve generalization performance, little attention has been paid to it from the…

Machine Learning · Computer Science 2023-02-06 Ryo Karakida , Tomoumi Takase , Tomohiro Hayase , Kazuki Osawa

Deep neural networks with remarkably strong generalization performances are usually over-parameterized. Despite explicit regularization strategies are used for practitioners to avoid over-fitting, the impacts are often small. Some…

Computation and Language · Computer Science 2018-11-05 Deren Lei , Zichen Sun , Yijun Xiao , William Yang Wang

A widely believed explanation for the remarkable generalization capacities of overparameterized neural networks is that the optimization algorithms used for training induce an implicit bias towards benign solutions. To grasp this…

Machine Learning · Computer Science 2025-12-19 Maria Matveev , Vit Fojtik , Hung-Hsu Chou , Gitta Kutyniok , Johannes Maly

Modern machine learning models are often trained in a setting where the number of parameters exceeds the number of training samples. To understand the implicit bias of gradient descent in such overparameterized models, prior work has…

Machine Learning · Statistics 2025-10-29 Hannes Matt , Dominik Stöger

While the Implicit Bias(or Implicit Regularization) of standard loss functions has been studied, the optimization geometry induced by discriminative metric-learning objectives remains largely unexplored.To the best of our knowledge, this…

Machine Learning · Computer Science 2026-04-13 Jiawen Li

Algorithmic approaches endow deep learning systems with implicit bias that helps them generalize even in over-parametrized settings. In this paper, we focus on understanding such a bias induced in learning through dropout, a popular…

Machine Learning · Computer Science 2018-06-27 Poorya Mianjy , Raman Arora , Rene Vidal

In an attempt to better understand generalization in deep learning, we study several possible explanations. We show that implicit regularization induced by the optimization method is playing a key role in generalization and success of deep…

Machine Learning · Computer Science 2017-09-11 Behnam Neyshabur

Modern deep learning models generalize remarkably well in-distribution, despite being overparametrized and trained with little to no explicit regularization. Instead, current theory credits implicit regularization imposed by the choice of…

Machine Learning · Computer Science 2026-03-17 Jonathan Wenger , Beau Coker , Juraj Marusic , John P. Cunningham

Overparameterized models may have many interpolating solutions; implicit regularization refers to the hidden preference of a particular optimization method towards a certain interpolating solution among the many. A by now established line…

Machine Learning · Computer Science 2024-09-18 Hung-Hsu Chou , Holger Rauhut , Rachel Ward

Mathematically characterizing the implicit regularization induced by gradient-based optimization is a longstanding pursuit in the theory of deep learning. A widespread hope is that a characterization based on minimization of norms may…

Machine Learning · Computer Science 2020-10-20 Noam Razin , Nadav Cohen

Despite Deep Learning's (DL) empirical success, our theoretical understanding of its efficacy remains limited. One notable paradox is that while conventional wisdom discourages perfect data fitting, deep neural networks are designed to do…

Machine Learning · Computer Science 2024-02-06 Oria Gruber , Haim Avron

In deep learning, it is common to use more network parameters than training points. In such scenarioof over-parameterization, there are usually multiple networks that achieve zero training error so that thetraining algorithm induces an…

Machine Learning · Computer Science 2023-08-22 Hung-Hsu Chou , Carsten Gieshoff , Johannes Maly , Holger Rauhut

Contrary to most machine learning models, modern deep artificial neural networks typically include multiple components that contribute to regularization. Despite the fact that some (explicit) regularization techniques, such as weight decay…

Computer Vision and Pattern Recognition · Computer Science 2020-11-13 Alex Hernández-García , Peter König

Implicit regularization is an important way to interpret neural networks. Recent theory starts to explain implicit regularization with the model of deep matrix factorization (DMF) and analyze the trajectory of discrete gradient dynamics in…

Machine Learning · Computer Science 2023-08-14 Jian Cao , Chen Qian , Yihui Huang , Dicheng Chen , Yuncheng Gao , Jiyang Dong , Di Guo , Xiaobo Qu

When optimizing over-parameterized models, such as deep neural networks, a large set of parameters can achieve zero training error. In such cases, the choice of the optimization algorithm and its respective hyper-parameters introduces…

Machine Learning · Computer Science 2019-12-06 Gauthier Gidel , Francis Bach , Simon Lacoste-Julien
‹ Prev 1 2 3 10 Next ›