Related papers: Deep learning generalizes because the parameter-fu…
Modern deep neural networks are highly over-parameterized compared to the data on which they are trained, yet they often generalize remarkably well. A flurry of recent work has asked: why do deep networks not overfit to their training data?…
Background: It is still an open research area to theoretically understand why Deep Neural Networks (DNNs)---equipped with many more parameters than training data and trained by (stochastic) gradient-based methods---often achieve remarkably…
Overparameterised deep neural networks (DNNs) are highly expressive and so can, in principle, generate almost any function that fits a training dataset with zero error. The vast majority of these functions will perform poorly on unseen…
Deep neural networks are renowned for their ability to generalise well across diverse tasks, even when heavily overparameterized. Existing works offer only partial explanations (for example, the NTK-based task-model alignment explanation…
Deep neural networks generalize well despite being heavily overparameterized, in apparent contradiction with classical learning theory based on uniform convergence over fixed hypothesis spaces. Uniform bounds over the entire parameter space…
This paper focuses on over-parameterized deep neural networks (DNNs) with ReLU activation functions and proves that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification while obtaining…
Deep neural networks (DNNs) are typically optimized using various forms of mini-batch gradient descent algorithm. A major motivation for mini-batch gradient descent is that with a suitably chosen batch size, available computing resources…
In this work, we describe a new approach that uses deep neural networks (DNN) to obtain regularization parameters for solving inverse problems. We consider a supervised learning approach, where a network is trained to approximate the…
This dissertation studies a fundamental open challenge in deep learning theory: why do deep networks generalize well even while being overparameterized, unregularized and fitting the training data to zero error? In the first part of the…
Neural networks (NNs) are known to exhibit simplicity bias where they tend to prefer learning 'simple' features over more 'complex' ones, even when the latter may be more informative. Simplicity bias can lead to the model making biased…
In the past decade, deep neural networks (DNNs) came to the fore as the leading machine learning algorithms for a variety of tasks. Their raise was founded on market needs and engineering craftsmanship, the latter based more on trial and…
In this paper we consider Deep Neural Networks (DNNs) with a smooth activation function as surrogates for high-dimensional functions that are somewhat smooth but costly to evaluate. We consider the standard (non-periodic) DNNs as well as…
Deep neural networks often contain far more parameters than training examples, yet they still manage to generalize well in practice. Classical complexity measures such as VC-dimension or PAC-Bayes bounds usually become vacuous in this…
Background. A main theoretical puzzle is why over-parameterized Neural Networks (NNs) generalize well when trained to zero loss (i.e., so they interpolate the data). Usually, the NN is trained with Stochastic Gradient Descent (SGD) or one…
Generalization is essential for deep learning. In contrast to previous works claiming that Deep Neural Networks (DNNs) have an implicit regularization implemented by the stochastic gradient descent, we demonstrate explicitly Bayesian…
We study the generalization of over-parameterized deep networks (for image classification) in relation to the convex hull of their training sets. Despite their great success, generalization of deep networks is considered a mystery. These…
Deep learning methods are known to generalize well from training to future data, even in an overparametrized regime, where they could easily overfit. One explanation for this phenomenon is that even when their *ambient dimensionality*,…
Promising resolutions of the generalization puzzle observe that the actual number of parameters in a deep network is much smaller than naive estimates suggest. The renormalization group is a compelling example of a problem which has very…
We address the fundamental question of why deep neural networks generalize by establishing a pointwise generalization theory for fully connected networks. This framework resolves long-standing barriers to characterizing the rich nonlinear…
Deep neural networks (DNNs) have become increasingly important due to their excellent empirical performance on a wide range of problems. However, regularization is generally achieved by indirect means, largely due to the complex set of…