Related papers: Implicit Regularization for Group Sparsity

Implicit Gradient Regularization

Gradient descent can be surprisingly good at optimizing deep neural networks without overfitting and without explicit regularization. We find that the discrete steps of gradient descent implicitly regularize models by penalizing gradient…

Machine Learning · Computer Science 2022-07-20 David G. T. Barrett , Benoit Dherin

Implicit Sparse Regularization: The Impact of Depth and Early Stopping

In this paper, we study the implicit bias of gradient descent for sparse regression. We extend results on regression with quadratic parametrization, which amounts to depth-2 diagonal linear networks, to more general depth-N networks, under…

Machine Learning · Statistics 2021-10-28 Jiangyuan Li , Thanh V. Nguyen , Chinmay Hegde , Raymond K. W. Wong

Implicit Regularization for Optimal Sparse Recovery

We investigate implicit regularization schemes for gradient descent methods applied to unpenalized least squares regression to solve the problem of reconstructing a sparse signal from an underdetermined system of linear measurements under…

Machine Learning · Statistics 2019-09-12 Tomas Vaškevičius , Varun Kanade , Patrick Rebeschini

Get rid of your constraints and reparametrize: A study in NNLS and implicit bias

Over the past years, there has been significant interest in understanding the implicit bias of gradient descent optimization and its connection to the generalization properties of overparametrized neural networks. Several works observed…

Optimization and Control · Mathematics 2025-03-11 Hung-Hsu Chou , Johannes Maly , Claudio Mayrink Verdun , Bernardo Freitas Paulo da Costa , Heudson Mirandola

High-Dimensional Linear Regression via Implicit Regularization

Many statistical estimators for high-dimensional linear regression are M-estimators, formed through minimizing a data-dependent square loss function plus a regularizer. This work considers a new class of estimators implicitly defined…

Statistics Theory · Mathematics 2022-02-15 Peng Zhao , Yun Yang , Qiao-Chu He

Robust Implicit Regularization via Weight Normalization

Overparameterized models may have many interpolating solutions; implicit regularization refers to the hidden preference of a particular optimization method towards a certain interpolating solution among the many. A by now established line…

Machine Learning · Computer Science 2024-09-18 Hung-Hsu Chou , Holger Rauhut , Rachel Ward

Reparameterizing Mirror Descent as Gradient Descent

Most of the recent successful applications of neural networks have been based on training with gradient descent updates. However, for some small networks, other mirror descent updates learn provably more efficiently when the target is…

Machine Learning · Computer Science 2020-06-24 Ehsan Amid , Manfred K. Warmuth

Implicit Regularization of Discrete Gradient Dynamics in Linear Neural Networks

When optimizing over-parameterized models, such as deep neural networks, a large set of parameters can achieve zero training error. In such cases, the choice of the optimization algorithm and its respective hyper-parameters introduces…

Machine Learning · Computer Science 2019-12-06 Gauthier Gidel , Francis Bach , Simon Lacoste-Julien

Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution

Recent years have seen a flurry of activities in designing provably efficient nonconvex procedures for solving statistical estimation problems. Due to the highly nonconvex nature of the empirical loss, state-of-the-art procedures often…

Machine Learning · Computer Science 2020-06-09 Cong Ma , Kaizheng Wang , Yuejie Chi , Yuxin Chen

Conflicting Biases at the Edge of Stability: Norm versus Sharpness Regularization

A widely believed explanation for the remarkable generalization capacities of overparameterized neural networks is that the optimization algorithms used for training induce an implicit bias towards benign solutions. To grasp this…

Machine Learning · Computer Science 2025-12-19 Maria Matveev , Vit Fojtik , Hung-Hsu Chou , Gitta Kutyniok , Johannes Maly

Linear regression with overparameterized linear neural networks: Tight upper and lower bounds for implicit $\ell^1$-regularization

Modern machine learning models are often trained in a setting where the number of parameters exceeds the number of training samples. To understand the implicit bias of gradient descent in such overparameterized models, prior work has…

Machine Learning · Statistics 2025-10-29 Hannes Matt , Dominik Stöger

Depth Without the Magic: Inductive Bias of Natural Gradient Descent

In gradient descent, changing how we parametrize the model can lead to drastically different optimization trajectories, giving rise to a surprising range of meaningful inductive biases: identifying sparse classifiers or reconstructing…

Machine Learning · Statistics 2021-11-24 Anna Kerekes , Anna Mészáros , Ferenc Huszár

Implicit Reparameterization Gradients

By providing a simple and efficient way of computing low-variance gradients of continuous random variables, the reparameterization trick has become the technique of choice for training a variety of latent variable models. However, it is not…

Machine Learning · Computer Science 2019-01-31 Michael Figurnov , Shakir Mohamed , Andriy Mnih

Implicit Regularization in ReLU Networks with the Square Loss

Understanding the implicit regularization (or implicit bias) of gradient descent has recently been a very active research area. However, the implicit regularization in nonlinear neural networks is still poorly understood, especially for…

Machine Learning · Computer Science 2021-06-09 Gal Vardi , Ohad Shamir

Implicit Regularization in Deep Matrix Factorization

Efforts to understand the generalization mystery in deep learning have led to the belief that gradient-based optimization induces a form of implicit regularization, a bias towards models of low "complexity." We study the implicit…

Machine Learning · Computer Science 2019-10-29 Sanjeev Arora , Nadav Cohen , Wei Hu , Yuping Luo

Implicit Regularization in Over-parameterized Neural Networks

Over-parameterized neural networks generalize well in practice without any explicit regularization. Although it has not been proven yet, empirical evidence suggests that implicit regularization plays a crucial role in deep learning and…

Machine Learning · Computer Science 2019-03-07 Masayoshi Kubo , Ryotaro Banno , Hidetaka Manabe , Masataka Minoji

A Unified Approach to Controlling Implicit Regularization via Mirror Descent

Inspired by the remarkable success of large neural networks, there has been significant interest in understanding the generalization performance of over-parameterized models. Substantial efforts have been invested in characterizing how…

Machine Learning · Computer Science 2024-01-12 Haoyuan Sun , Khashayar Gatmiry , Kwangjun Ahn , Navid Azizan

Implicit Regularization via Spectral Neural Networks and Non-linear Matrix Sensing

The phenomenon of implicit regularization has attracted interest in recent years as a fundamental aspect of the remarkable generalizing ability of neural networks. In a nutshell, it entails that gradient descent dynamics in many neural…

Machine Learning · Computer Science 2024-02-28 Hong T. M. Chu , Subhro Ghosh , Chi Thanh Lam , Soumendu Sundar Mukherjee

(S)GD over Diagonal Linear Networks: Implicit Regularisation, Large Stepsizes and Edge of Stability

In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over diagonal linear networks. We prove the convergence of GD and…

Machine Learning · Computer Science 2023-10-26 Mathieu Even , Scott Pesme , Suriya Gunasekar , Nicolas Flammarion

The Statistical Complexity of Early-Stopped Mirror Descent

Recently there has been a surge of interest in understanding implicit regularization properties of iterative gradient-based optimization algorithms. In this paper, we study the statistical guarantees on the excess risk achieved by…

Machine Learning · Statistics 2020-08-28 Tomas Vaškevičius , Varun Kanade , Patrick Rebeschini