Related papers: Theoretical Issues in Deep Networks: Approximation…

Theory of Deep Learning III: explaining the non-overfitting puzzle

A main puzzle of deep networks revolves around the absence of overfitting despite large overparametrization and despite the large capacity demonstrated by zero training error on randomly labeled data. In this note, we show that the dynamics…

Machine Learning · Computer Science 2018-01-17 Tomaso Poggio , Kenji Kawaguchi , Qianli Liao , Brando Miranda , Lorenzo Rosasco , Xavier Boix , Jack Hidary , Hrushikesh Mhaskar

Approximation results for Gradient Descent trained Shallow Neural Networks in $1d$

Two aspects of neural networks that have been extensively studied in the recent literature are their function approximation properties and their training by gradient descent methods. The approximation problem seeks accurate approximations…

Machine Learning · Computer Science 2022-09-20 R. Gentile , G. Welper

Theory III: Dynamics and Generalization in Deep Networks

The key to generalization is controlling the complexity of the network. However, there is no obvious control of complexity -- such as an explicit regularization term -- in the training of deep networks for classification. We will show that…

Machine Learning · Computer Science 2020-04-14 Andrzej Banburski , Qianli Liao , Brando Miranda , Lorenzo Rosasco , Fernanda De La Torre , Jack Hidary , Tomaso Poggio

Theory IIIb: Generalization in Deep Networks

A main puzzle of deep neural networks (DNNs) revolves around the apparent absence of "overfitting", defined in this paper as follows: the expected error does not get worse when increasing the number of neurons or of iterations of gradient…

Machine Learning · Computer Science 2018-07-02 Tomaso Poggio , Qianli Liao , Brando Miranda , Andrzej Banburski , Xavier Boix , Jack Hidary

A Convergence Theory for Deep Learning via Over-Parameterization

Deep neural networks (DNNs) have demonstrated dominating performance in many fields; since AlexNet, networks used in practice are going wider and deeper. On the theoretical side, a long line of works has been focusing on training neural…

Machine Learning · Computer Science 2019-06-18 Zeyuan Allen-Zhu , Yuanzhi Li , Zhao Song

A Convergence Theory Towards Practical Over-parameterized Deep Neural Networks

Deep neural networks' remarkable ability to correctly fit training data when optimized by gradient-based algorithms is yet to be fully understood. Recent theoretical results explain the convergence for ReLU networks that are wider than…

Machine Learning · Computer Science 2021-02-09 Asaf Noy , Yi Xu , Yonathan Aflalo , Lihi Zelnik-Manor , Rong Jin

Approximation Power of Deep Neural Networks: an explanatory mathematical survey

This survey provides an in-depth and explanatory review of the approximation properties of deep neural networks, with a focus on feed-forward and residual architectures. The primary objective is to examine how effectively neural networks…

Machine Learning · Computer Science 2024-12-18 Owen Davis , Mohammad Motamed

How Much Over-parameterization Is Sufficient to Learn Deep ReLU Networks?

A recent line of research on deep learning focuses on the extremely over-parameterized setting, and shows that when the network width is larger than a high degree polynomial of the training sample size $n$ and the inverse of the target…

Machine Learning · Computer Science 2022-01-03 Zixiang Chen , Yuan Cao , Difan Zou , Quanquan Gu

A global convergence theory for deep ReLU implicit networks via over-parameterization

Implicit deep learning has received increasing attention recently due to the fact that it generalizes the recursive prediction rules of many commonly used neural network architectures. Its prediction rule is provided implicitly based on the…

Machine Learning · Computer Science 2022-02-21 Tianxiang Gao , Hailiang Liu , Jia Liu , Hridesh Rajan , Hongyang Gao

On the optimization and generalization of overparameterized implicit neural networks

Implicit neural networks have become increasingly attractive in the machine learning community since they can achieve competitive performance but use much less computational resources. Recently, a line of theoretical works established the…

Machine Learning · Computer Science 2022-10-03 Tianxiang Gao , Hongyang Gao

Linear regression with overparameterized linear neural networks: Tight upper and lower bounds for implicit $\ell^1$-regularization

Modern machine learning models are often trained in a setting where the number of parameters exceeds the number of training samples. To understand the implicit bias of gradient descent in such overparameterized models, prior work has…

Machine Learning · Statistics 2025-10-29 Hannes Matt , Dominik Stöger

Solving Inverse Problems with Deep Linear Neural Networks: Global Convergence Guarantees for Gradient Descent with Weight Decay

Machine learning methods are commonly used to solve inverse problems, wherein an unknown signal must be estimated from few indirect measurements generated via a known acquisition procedure. In particular, neural networks perform well…

Machine Learning · Computer Science 2025-12-05 Hannah Laus , Suzanna Parkinson , Vasileios Charisopoulos , Felix Krahmer , Rebecca Willett

Gradient Descent Provably Optimizes Over-parameterized Neural Networks

One of the mysteries in the success of neural networks is randomly initialized first order methods like gradient descent can achieve zero training loss even though the objective function is non-convex and non-smooth. This paper demystifies…

Machine Learning · Computer Science 2019-02-06 Simon S. Du , Xiyu Zhai , Barnabas Poczos , Aarti Singh

Deep Neural Network Approximation Theory

This paper develops fundamental limits of deep neural network learning by characterizing what is possible if no constraints are imposed on the learning algorithm and on the amount of training data. Concretely, we consider Kolmogorov-optimal…

Machine Learning · Computer Science 2021-03-15 Dennis Elbrächter , Dmytro Perekrestenko , Philipp Grohs , Helmut Bölcskei

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

We study the problem of training deep neural networks with Rectified Linear Unit (ReLU) activation function using gradient descent and stochastic gradient descent. In particular, we study the binary classification problem and show that for…

Machine Learning · Computer Science 2018-12-31 Difan Zou , Yuan Cao , Dongruo Zhou , Quanquan Gu

Generalization and Expressivity for Deep Nets

Along with the rapid development of deep learning in practice, the theoretical explanations for its success become urgent. Generalization and expressivity are two widely used measurements to quantify theoretical behaviors of deep learning.…

Machine Learning · Computer Science 2018-03-26 Shao-Bo Lin

Implicit Regularization in Over-parameterized Neural Networks

Over-parameterized neural networks generalize well in practice without any explicit regularization. Although it has not been proven yet, empirical evidence suggests that implicit regularization plays a crucial role in deep learning and…

Machine Learning · Computer Science 2019-03-07 Masayoshi Kubo , Ryotaro Banno , Hidetaka Manabe , Masataka Minoji

Explaining generalization in deep learning: progress and fundamental limits

This dissertation studies a fundamental open challenge in deep learning theory: why do deep networks generalize well even while being overparameterized, unregularized and fitting the training data to zero error? In the first part of the…

Machine Learning · Computer Science 2021-10-19 Vaishnavh Nagarajan

Gradient Descent Quantizes ReLU Network Features

Deep neural networks are often trained in the over-parametrized regime (i.e. with far more parameters than training examples), and understanding why the training converges to solutions that generalize remains an open problem. Several…

Machine Learning · Statistics 2018-03-23 Hartmut Maennel , Olivier Bousquet , Sylvain Gelly

Generalization Error Bounds of Gradient Descent for Learning Over-parameterized Deep ReLU Networks

Empirical studies show that gradient-based methods can learn deep neural networks (DNNs) with very good generalization performance in the over-parameterization regime, where DNNs can easily fit a random labeling of the training data. Very…

Machine Learning · Computer Science 2019-11-28 Yuan Cao , Quanquan Gu