Related papers: Deep learning: a statistical viewpoint

On the ISS Property of the Gradient Flow for Single Hidden-Layer Neural Networks with Linear Activations

Recent research in neural networks and machine learning suggests that using many more parameters than strictly required by the initial complexity of a regression problem can result in more accurate or faster-converging models -- contrary to…

Machine Learning · Computer Science 2023-05-18 Arthur Castello B. de Oliveira , Milad Siami , Eduardo D. Sontag

Benign Overfitting in Linear Regression

The phenomenon of benign overfitting is one of the key mysteries uncovered by deep learning methodology: deep neural networks seem to predict well, even with a perfect fit to noisy training data. Motivated by this phenomenon, we consider…

Machine Learning · Statistics 2022-06-08 Peter L. Bartlett , Philip M. Long , Gábor Lugosi , Alexander Tsigler

Understanding Benign Overfitting in Gradient-Based Meta Learning

Meta learning has demonstrated tremendous success in few-shot learning with limited supervised data. In those settings, the meta model is usually overparameterized. While the conventional statistical learning theory suggests that…

Machine Learning · Computer Science 2022-11-10 Lisha Chen , Songtao Lu , Tianyi Chen

Conflicting Biases at the Edge of Stability: Norm versus Sharpness Regularization

A widely believed explanation for the remarkable generalization capacities of overparameterized neural networks is that the optimization algorithms used for training induce an implicit bias towards benign solutions. To grasp this…

Machine Learning · Computer Science 2025-12-19 Maria Matveev , Vit Fojtik , Hung-Hsu Chou , Gitta Kutyniok , Johannes Maly

A Framework for Overparameterized Learning

A candidate explanation of the good empirical performance of deep neural networks is the implicit regularization effect of first order optimization methods. Inspired by this, we prove a convergence theorem for nonconvex composite…

Machine Learning · Computer Science 2023-02-14 Dávid Terjék , Diego González-Sánchez

A Classical View on Benign Overfitting: The Role of Sample Size

Benign overfitting is a phenomenon in machine learning where a model perfectly fits (interpolates) the training data, including noisy examples, yet still generalizes well to unseen data. Understanding this phenomenon has attracted…

Machine Learning · Computer Science 2025-05-20 Junhyung Park , Patrick Bloebaum , Shiva Prasad Kasiviswanathan

The Implicit Bias of Benign Overfitting

The phenomenon of benign overfitting, where a predictor perfectly fits noisy training data while attaining near-optimal expected loss, has received much attention in recent years, but still remains not fully understood beyond well-specified…

Machine Learning · Computer Science 2023-04-18 Ohad Shamir

Gradient Descent for Deep Matrix Factorization: Dynamics and Implicit Bias towards Low Rank

In deep learning, it is common to use more network parameters than training points. In such scenarioof over-parameterization, there are usually multiple networks that achieve zero training error so that thetraining algorithm induces an…

Machine Learning · Computer Science 2023-08-22 Hung-Hsu Chou , Carsten Gieshoff , Johannes Maly , Holger Rauhut

On Generalization of Adaptive Methods for Over-parameterized Linear Regression

Over-parameterization and adaptive methods have played a crucial role in the success of deep learning in the last decade. The widespread use of over-parameterization has forced us to rethink generalization by bringing forth new phenomena,…

Machine Learning · Statistics 2020-12-01 Vatsal Shah , Soumya Basu , Anastasios Kyrillidis , Sujay Sanghavi

Overfitting Mechanism and Avoidance in Deep Neural Networks

Assisted by the availability of data and high performance computing, deep learning techniques have achieved breakthroughs and surpassed human performance empirically in difficult tasks, including object recognition, speech recognition, and…

Machine Learning · Computer Science 2019-01-23 Shaeke Salman , Xiuwen Liu

Deep Learning is Not So Mysterious or Different

Deep neural networks are often seen as different from other model classes by defying conventional notions of generalization. Popular examples of anomalous generalization behaviour include benign overfitting, double descent, and the success…

Machine Learning · Computer Science 2025-07-11 Andrew Gordon Wilson

To understand deep learning we need to understand kernel learning

Generalization performance of classifiers in deep learning has recently become a subject of intense study. Deep models, typically over-parametrized, tend to fit the training data exactly. Despite this "overfitting", they perform well on…

Machine Learning · Statistics 2018-06-18 Mikhail Belkin , Siyuan Ma , Soumik Mandal

Learning through atypical "phase transitions" in overparameterized neural networks

Current deep neural networks are highly overparameterized (up to billions of connection weights) and nonlinear. Yet they can fit data almost perfectly through variants of gradient descent algorithms and achieve unexpected levels of…

Machine Learning · Computer Science 2022-07-27 Carlo Baldassi , Clarissa Lauditi , Enrico M. Malatesta , Rosalba Pacelli , Gabriele Perugini , Riccardo Zecchina

Coherent Gradients: An Approach to Understanding Generalization in Gradient Descent-based Optimization

An open question in the Deep Learning community is why neural networks trained with Gradient Descent generalize well on real datasets even though they are capable of fitting random data. We propose an approach to answering this question…

Machine Learning · Computer Science 2020-02-26 Satrajit Chatterjee

Convergence and Implicit Bias of Gradient Flow on Overparametrized Linear Networks

Neural networks trained via gradient descent with random initialization and without any regularization enjoy good generalization performance in practice despite being highly overparametrized. A promising direction to explain this phenomenon…

Machine Learning · Computer Science 2022-05-17 Hancheng Min , Salma Tarmoun , Rene Vidal , Enrique Mallada

Theory of Deep Learning III: explaining the non-overfitting puzzle

A main puzzle of deep networks revolves around the absence of overfitting despite large overparametrization and despite the large capacity demonstrated by zero training error on randomly labeled data. In this note, we show that the dynamics…

Machine Learning · Computer Science 2018-01-17 Tomaso Poggio , Kenji Kawaguchi , Qianli Liao , Brando Miranda , Lorenzo Rosasco , Xavier Boix , Jack Hidary , Hrushikesh Mhaskar

Theoretical Issues in Deep Networks: Approximation, Optimization and Generalization

While deep learning is successful in a number of applications, it is not yet well understood theoretically. A satisfactory theoretical characterization of deep learning however, is beginning to emerge. It covers the following questions: 1)…

Machine Learning · Computer Science 2019-08-27 Tomaso Poggio , Andrzej Banburski , Qianli Liao

Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data

Benign overfitting, the phenomenon where interpolating models generalize well in the presence of noisy data, was first observed in neural network models trained with gradient descent. To better understand this empirical observation, we…

Machine Learning · Computer Science 2025-07-04 Spencer Frei , Niladri S. Chatterji , Peter L. Bartlett

Variational Deep Learning via Implicit Regularization

Modern deep learning models generalize remarkably well in-distribution, despite being overparametrized and trained with little to no explicit regularization. Instead, current theory credits implicit regularization imposed by the choice of…

Machine Learning · Computer Science 2026-03-17 Jonathan Wenger , Beau Coker , Juraj Marusic , John P. Cunningham

Nearly Minimal Over-Parametrization of Shallow Neural Networks

A recent line of work has shown that an overparametrized neural network can perfectly fit the training data, an otherwise often intractable nonconvex optimization problem. For (fully-connected) shallow networks, in the best case scenario,…

Machine Learning · Computer Science 2019-10-30 Armin Eftekhari , ChaeHwan Song , Volkan Cevher