Related papers: Deep learning generalizes because the parameter-fu…

The Low-Rank Simplicity Bias in Deep Networks

Modern deep neural networks are highly over-parameterized compared to the data on which they are trained, yet they often generalize remarkably well. A flurry of recent work has asked: why do deep networks not overfit to their training data?…

Machine Learning · Computer Science 2023-03-24 Minyoung Huh , Hossein Mobahi , Richard Zhang , Brian Cheung , Pulkit Agrawal , Phillip Isola

Understanding training and generalization in deep learning by Fourier analysis

Background: It is still an open research area to theoretically understand why Deep Neural Networks (DNNs)---equipped with many more parameters than training data and trained by (stochastic) gradient-based methods---often achieve remarkably…

Machine Learning · Computer Science 2018-11-30 Zhiqin John Xu

Is SGD a Bayesian sampler? Well, almost

Overparameterised deep neural networks (DNNs) are highly expressive and so can, in principle, generate almost any function that fits a training dataset with zero error. The vast majority of these functions will perform poorly on unseen…

Machine Learning · Computer Science 2021-04-13 Chris Mingard , Guillermo Valle-Pérez , Joar Skalse , Ard A. Louis

Characterising the Inductive Biases of Neural Networks on Boolean Data

Deep neural networks are renowned for their ability to generalise well across diverse tasks, even when heavily overparameterized. Existing works offer only partial explanations (for example, the NTK-based task-model alignment explanation…

Machine Learning · Computer Science 2025-06-02 Chris Mingard , Lukas Seier , Niclas Göring , Andrei-Vlad Badelita , Charles London , Ard Louis

A Survey on Data-Dependent Worst-Case Generalization Bounds

Deep neural networks generalize well despite being heavily overparameterized, in apparent contradiction with classical learning theory based on uniform convergence over fixed hypothesis spaces. Uniform bounds over the entire parameter space…

Machine Learning · Statistics 2026-05-15 Hubert Leroux , Jean Marcus , Julien Roger

Benign Overfitting in Deep Neural Networks under Lazy Training

This paper focuses on over-parameterized deep neural networks (DNNs) with ReLU activation functions and proves that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification while obtaining…

Machine Learning · Computer Science 2023-06-01 Zhenyu Zhu , Fanghui Liu , Grigorios G Chrysos , Francesco Locatello , Volkan Cevher

A New Perspective for Understanding Generalization Gap of Deep Neural Networks Trained with Large Batch Sizes

Deep neural networks (DNNs) are typically optimized using various forms of mini-batch gradient descent algorithm. A major motivation for mini-batch gradient descent is that with a suitably chosen batch size, available computing resources…

Machine Learning · Computer Science 2022-10-25 Oyebade K. Oyedotun , Konstantinos Papadopoulos , Djamila Aouada

Learning Regularization Parameters of Inverse Problems via Deep Neural Networks

In this work, we describe a new approach that uses deep neural networks (DNN) to obtain regularization parameters for solving inverse problems. We consider a supervised learning approach, where a network is trained to approximate the…

Numerical Analysis · Mathematics 2021-04-15 Babak Maboudi Afkham , Julianne Chung , Matthias Chung

Explaining generalization in deep learning: progress and fundamental limits

This dissertation studies a fundamental open challenge in deep learning theory: why do deep networks generalize well even while being overparameterized, unregularized and fitting the training data to zero error? In the first part of the…

Machine Learning · Computer Science 2021-10-19 Vaishnavh Nagarajan

Mitigating Simplicity Bias in Deep Learning for Improved OOD Generalization and Robustness

Neural networks (NNs) are known to exhibit simplicity bias where they tend to prefer learning 'simple' features over more 'complex' ones, even when the latter may be more informative. Simplicity bias can lead to the model making biased…

Machine Learning · Computer Science 2023-10-11 Bhavya Vasudeva , Kameron Shahabi , Vatsal Sharan

Learning Curves for Deep Neural Networks: A Gaussian Field Theory Perspective

In the past decade, deep neural networks (DNNs) came to the fore as the leading machine learning algorithms for a variety of tasks. Their raise was founded on market needs and engineering craftsmanship, the latter based more on trial and…

Machine Learning · Computer Science 2021-04-14 Omry Cohen , Or Malka , Zohar Ringel

Regularity and tailored regularization of Deep Neural Networks, with application to parametric PDEs in uncertainty quantification

In this paper we consider Deep Neural Networks (DNNs) with a smooth activation function as surrogates for high-dimensional functions that are somewhat smooth but costly to evaluate. We consider the standard (non-periodic) DNNs as well as…

Numerical Analysis · Mathematics 2026-03-04 Alexander Keller , Frances Y. Kuo , Dirk Nuyens , Ian H. Sloan

Provable Generalization in Overparameterized Neural Nets

Deep neural networks often contain far more parameters than training examples, yet they still manage to generalize well in practice. Classical complexity measures such as VC-dimension or PAC-Bayes bounds usually become vacuous in this…

Machine Learning · Computer Science 2025-08-26 Aviral Dhingra

How Uniform Random Weights Induce Non-uniform Bias: Typical Interpolating Neural Networks Generalize with Narrow Teachers

Background. A main theoretical puzzle is why over-parameterized Neural Networks (NNs) generalize well when trained to zero loss (i.e., so they interpolate the data). Usually, the NN is trained with Stochastic Gradient Descent (SGD) or one…

Machine Learning · Computer Science 2025-02-18 Gon Buzaglo , Itamar Harel , Mor Shpigel Nacson , Alon Brutzkus , Nathan Srebro , Daniel Soudry

Explicitly Bayesian Regularizations in Deep Learning

Generalization is essential for deep learning. In contrast to previous works claiming that Deep Neural Networks (DNNs) have an implicit regularization implemented by the stochastic gradient descent, we demonstrate explicitly Bayesian…

Machine Learning · Computer Science 2019-10-23 Xinjie Lan , Kenneth E. Barner

Deep Learning Generalization, Extrapolation, and Over-parameterization

We study the generalization of over-parameterized deep networks (for image classification) in relation to the convex hull of their training sets. Despite their great success, generalization of deep networks is considered a mystery. These…

Machine Learning · Computer Science 2022-03-22 Roozbeh Yousefzadeh

From Low Intrinsic Dimensionality to Non-Vacuous Generalization Bounds in Deep Multi-Task Learning

Deep learning methods are known to generalize well from training to future data, even in an overparametrized regime, where they could easily overfit. One explanation for this phenomenon is that even when their *ambient dimensionality*,…

Machine Learning · Computer Science 2025-05-22 Hossein Zakerinia , Dorsa Ghobadi , Christoph H. Lampert

Why Unsupervised Deep Networks Generalize

Promising resolutions of the generalization puzzle observe that the actual number of parameters in a deep network is much smaller than naive estimates suggest. The renormalization group is a compelling example of a problem which has very…

Machine Learning · Computer Science 2020-12-08 Anita de Mello Koch , Ellen de Mello Koch , Robert de Mello Koch

Pointwise Generalization in Deep Neural Networks

We address the fundamental question of why deep neural networks generalize by establishing a pointwise generalization theory for fully connected networks. This framework resolves long-standing barriers to characterizing the rich nonlinear…

Machine Learning · Computer Science 2026-05-19 Shaojie Li , Yunbei Xu

Function Norms and Regularization in Deep Networks

Deep neural networks (DNNs) have become increasingly important due to their excellent empirical performance on a wide range of problems. However, regularization is generally achieved by indirect means, largely due to the complex set of…

Machine Learning · Computer Science 2018-07-02 Amal Rannen Triki , Maxim Berman , Matthew B. Blaschko