Related papers: Implicit Bias in Deep Linear Classification: Initi…

Convergence and Implicit Bias of Gradient Flow on Overparametrized Linear Networks

Neural networks trained via gradient descent with random initialization and without any regularization enjoy good generalization performance in practice despite being highly overparametrized. A promising direction to explain this phenomenon…

Machine Learning · Computer Science 2022-05-17 Hancheng Min , Salma Tarmoun , Rene Vidal , Enrique Mallada

Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss

Neural networks trained to minimize the logistic (a.k.a. cross-entropy) loss with gradient-based methods are observed to perform well in many supervised classification tasks. Towards understanding this phenomenon, we analyze the training…

Optimization and Control · Mathematics 2020-06-23 Lenaic Chizat , Francis Bach

Implicit Regularization of Discrete Gradient Dynamics in Linear Neural Networks

When optimizing over-parameterized models, such as deep neural networks, a large set of parameters can achieve zero training error. In such cases, the choice of the optimization algorithm and its respective hyper-parameters introduces…

Machine Learning · Computer Science 2019-12-06 Gauthier Gidel , Francis Bach , Simon Lacoste-Julien

Limitations of Implicit Bias in Matrix Sensing: Initialization Rank Matters

In matrix sensing, we first numerically identify the sensitivity to the initialization rank as a new limitation of the implicit bias of gradient flow. We will partially quantify this phenomenon mathematically, where we establish that the…

Information Theory · Computer Science 2021-06-08 Armin Eftekhari , Konstantinos Zygalakis

The Implicit Bias of Gradient Descent on Generalized Gated Linear Networks

Understanding the asymptotic behavior of gradient-descent training of deep neural networks is essential for revealing inductive biases and improving network performance. We derive the infinite-time training limit of a mathematically…

Machine Learning · Statistics 2022-02-08 Samuel Lippl , L. F. Abbott , SueYeon Chung

Gradient Descent for Deep Matrix Factorization: Dynamics and Implicit Bias towards Low Rank

In deep learning, it is common to use more network parameters than training points. In such scenarioof over-parameterization, there are usually multiple networks that achieve zero training error so that thetraining algorithm induces an…

Machine Learning · Computer Science 2023-08-22 Hung-Hsu Chou , Carsten Gieshoff , Johannes Maly , Holger Rauhut

Linear regression with overparameterized linear neural networks: Tight upper and lower bounds for implicit $\ell^1$-regularization

Modern machine learning models are often trained in a setting where the number of parameters exceeds the number of training samples. To understand the implicit bias of gradient descent in such overparameterized models, prior work has…

Machine Learning · Statistics 2025-10-29 Hannes Matt , Dominik Stöger

Implicit bias of deep linear networks in the large learning rate phase

Most theoretical studies explaining the regularization effect in deep learning have only focused on gradient descent with a sufficient small learning rate or even gradient flow (infinitesimal learning rate). Such researches, however, have…

Machine Learning · Computer Science 2020-12-17 Wei Huang , Weitao Du , Richard Yi Da Xu , Chunrui Liu

A Unifying View on Implicit Bias in Training Linear Neural Networks

We study the implicit bias of gradient flow (i.e., gradient descent with infinitesimal step size) on linear neural network training. We propose a tensor formulation of neural networks that includes fully-connected, diagonal, and…

Machine Learning · Computer Science 2021-09-13 Chulhee Yun , Shankar Krishnan , Hossein Mobahi

On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent

Recent work has highlighted the role of initialization scale in determining the structure of the solutions that gradient methods converge to. In particular, it was shown that large initialization leads to the neural tangent kernel regime…

Machine Learning · Computer Science 2021-02-22 Shahar Azulay , Edward Moroshko , Mor Shpigel Nacson , Blake Woodworth , Nathan Srebro , Amir Globerson , Daniel Soudry

Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data

The implicit biases of gradient-based optimization algorithms are conjectured to be a major factor in the success of modern deep learning. In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer…

Machine Learning · Computer Science 2022-10-14 Spencer Frei , Gal Vardi , Peter L. Bartlett , Nathan Srebro , Wei Hu

Implicit Bias in Deep Linear Discriminant Analysis

While the Implicit Bias(or Implicit Regularization) of standard loss functions has been studied, the optimization geometry induced by discriminative metric-learning objectives remains largely unexplored.To the best of our knowledge, this…

Machine Learning · Computer Science 2026-04-13 Jiawen Li

Implicit Gradient Regularization

Gradient descent can be surprisingly good at optimizing deep neural networks without overfitting and without explicit regularization. We find that the discrete steps of gradient descent implicitly regularize models by penalizing gradient…

Machine Learning · Computer Science 2022-07-20 David G. T. Barrett , Benoit Dherin

Continuous vs. Discrete Optimization of Deep Neural Networks

Existing analyses of optimization in deep learning are either continuous, focusing on (variants of) gradient flow, or discrete, directly treating (variants of) gradient descent. Gradient flow is amenable to theoretical analysis, but is…

Machine Learning · Computer Science 2021-12-30 Omer Elkabetz , Nadav Cohen

Combining Explicit and Implicit Regularization for Efficient Learning in Deep Networks

Works on implicit regularization have studied gradient trajectories during the optimization process to explain why deep networks favor certain kinds of solutions over others. In deep linear networks, it has been shown that gradient descent…

Machine Learning · Computer Science 2023-06-02 Dan Zhao

Implicit Bias of SGD for Diagonal Linear Networks: a Provable Benefit of Stochasticity

Understanding the implicit bias of training algorithms is of crucial importance in order to explain the success of overparametrised neural networks. In this paper, we study the dynamics of stochastic gradient descent over diagonal linear…

Machine Learning · Computer Science 2021-12-08 Scott Pesme , Loucas Pillaud-Vivien , Nicolas Flammarion

On the Role of Initialization on the Implicit Bias in Deep Linear Networks

Despite Deep Learning's (DL) empirical success, our theoretical understanding of its efficacy remains limited. One notable paradox is that while conventional wisdom discourages perfect data fitting, deep neural networks are designed to do…

Machine Learning · Computer Science 2024-02-06 Oria Gruber , Haim Avron

Conflicting Biases at the Edge of Stability: Norm versus Sharpness Regularization

A widely believed explanation for the remarkable generalization capacities of overparameterized neural networks is that the optimization algorithms used for training induce an implicit bias towards benign solutions. To grasp this…

Machine Learning · Computer Science 2025-12-19 Maria Matveev , Vit Fojtik , Hung-Hsu Chou , Gitta Kutyniok , Johannes Maly

Implicit Bias of Gradient Descent for Two-layer ReLU and Leaky ReLU Networks on Nearly-orthogonal Data

The implicit bias towards solutions with favorable properties is believed to be a key reason why neural networks trained by gradient-based optimization can generalize well. While the implicit bias of gradient flow has been widely studied…

Machine Learning · Computer Science 2023-10-31 Yiwen Kou , Zixiang Chen , Quanquan Gu

Gradient Descent as Implicit EM in Distance-Based Neural Models

Neural networks trained with standard objectives exhibit behaviors characteristic of probabilistic inference: soft clustering, prototype specialization, and Bayesian uncertainty tracking. These phenomena appear across architectures -- in…

Machine Learning · Computer Science 2026-01-01 Alan Oursland