Related papers: Improving Adaptivity via Over-Parameterization in …

On Generalization of Adaptive Methods for Over-parameterized Linear Regression

Over-parameterization and adaptive methods have played a crucial role in the success of deep learning in the last decade. The widespread use of over-parameterization has forced us to rethink generalization by bringing forth new phenomena,…

Machine Learning · Statistics 2020-12-01 Vatsal Shah , Soumya Basu , Anastasios Kyrillidis , Sujay Sanghavi

Diagonal Over-parameterization in Reproducing Kernel Hilbert Spaces as an Adaptive Feature Model: Generalization and Adaptivity

This paper introduces a diagonal adaptive kernel model that dynamically learns kernel eigenvalues and output coefficients simultaneously during training. Unlike fixed-kernel methods tied to the neural tangent kernel theory, the diagonal…

Machine Learning · Computer Science 2025-01-16 Yicheng Li , Qian Lin

More is Less: Inducing Sparsity via Overparameterization

In deep learning it is common to overparameterize neural networks, that is, to use more parameters than training samples. Quite surprisingly training the neural network via (stochastic) gradient descent leads to models that generalize very…

Optimization and Control · Mathematics 2025-01-30 Hung-Hsu Chou , Johannes Maly , Holger Rauhut

On the ISS Property of the Gradient Flow for Single Hidden-Layer Neural Networks with Linear Activations

Recent research in neural networks and machine learning suggests that using many more parameters than strictly required by the initial complexity of a regression problem can result in more accurate or faster-converging models -- contrary to…

Machine Learning · Computer Science 2023-05-18 Arthur Castello B. de Oliveira , Milad Siami , Eduardo D. Sontag

Approximation and Gradient Descent Training with Neural Networks

It is well understood that neural networks with carefully hand-picked weights provide powerful function approximation and that they can be successfully trained in over-parametrized regimes. Since over-parametrization ensures zero training…

Machine Learning · Computer Science 2024-05-21 G. Welper

An Improved Analysis of Training Over-parameterized Deep Neural Networks

A recent line of research has shown that gradient-based algorithms with random initialization can converge to the global minima of the training loss for over-parameterized (i.e., sufficiently wide) deep neural networks. However, the…

Machine Learning · Computer Science 2019-06-12 Difan Zou , Quanquan Gu

Spectral Bias and Task-Model Alignment Explain Generalization in Kernel Regression and Infinitely Wide Neural Networks

Generalization beyond a training dataset is a main goal of machine learning, but theoretical understanding of generalization remains an open problem for many models. The need for a new theory is exacerbated by recent observations in deep…

Machine Learning · Statistics 2022-02-08 Abdulkadir Canatar , Blake Bordelon , Cengiz Pehlevan

Beyond Cross-Validation: Adaptive Parameter Selection for Kernel-Based Gradient Descents

This paper proposes a novel parameter selection strategy for kernel-based gradient descent (KGD) algorithms, integrating bias-variance analysis with the splitting method. We introduce the concept of empirical effective dimension to quantify…

Machine Learning · Statistics 2026-03-05 Xiaotong Liu , Yunwen Lei , Xiangyu Chang , Shao-Bo Lin

Models Parametric Analysis via Adaptive Kernel Learning

Any applied mathematical model contains parameters. The paper proposes to use kernel learning for the parametric analysis of the model. The approach consists in setting a distribution on the parameter space, obtaining a finite training…

Optimization and Control · Mathematics 2025-01-27 Vladimir Norkin , Alois Pichler

Generalization in Kernel Regression Under Realistic Assumptions

It is by now well-established that modern over-parameterized models seem to elude the bias-variance tradeoff and generalize well despite overfitting noise. Many recent works attempt to analyze this phenomenon in the relatively tractable…

Machine Learning · Computer Science 2024-02-21 Daniel Barzilai , Ohad Shamir

The Spectral Bias of Shallow Neural Network Learning is Shaped by the Choice of Non-linearity

Despite classical statistical theory predicting severe overfitting, modern massively overparameterized neural networks still generalize well. This unexpected property is attributed to the network's so-called implicit bias, which describes…

Machine Learning · Computer Science 2025-03-14 Justin Sahs , Ryan Pyle , Fabio Anselmi , Ankit Patel

The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization

Modern deep learning models employ considerably more parameters than required to fit the training data. Whereas conventional statistical wisdom suggests such models should drastically overfit, in practice these models generalize remarkably…

Machine Learning · Statistics 2020-08-18 Ben Adlam , Jeffrey Pennington

Benefits of Jointly Training Autoencoders: An Improved Neural Tangent Kernel Analysis

A remarkable recent discovery in machine learning has been that deep neural networks can achieve impressive performance (in terms of both lower training error and higher generalization capacity) in the regime where they are massively…

Machine Learning · Computer Science 2020-03-03 Thanh V. Nguyen , Raymond K. W. Wong , Chinmay Hegde

Deep learning: a statistical viewpoint

The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite…

Statistics Theory · Mathematics 2021-03-17 Peter L. Bartlett , Andrea Montanari , Alexander Rakhlin

Learning and Generalization in Overparameterized Normalizing Flows

In supervised learning, it is known that overparameterized neural networks with one hidden layer provably and efficiently learn and generalize, when trained using stochastic gradient descent with a sufficiently small learning rate and…

Machine Learning · Computer Science 2022-03-24 Kulin Shah , Amit Deshpande , Navin Goyal

Linear regression with overparameterized linear neural networks: Tight upper and lower bounds for implicit $\ell^1$-regularization

Modern machine learning models are often trained in a setting where the number of parameters exceeds the number of training samples. To understand the implicit bias of gradient descent in such overparameterized models, prior work has…

Machine Learning · Statistics 2025-10-29 Hannes Matt , Dominik Stöger

Convergence and Implicit Bias of Gradient Flow on Overparametrized Linear Networks

Neural networks trained via gradient descent with random initialization and without any regularization enjoy good generalization performance in practice despite being highly overparametrized. A promising direction to explain this phenomenon…

Machine Learning · Computer Science 2022-05-17 Hancheng Min , Salma Tarmoun , Rene Vidal , Enrique Mallada

On Kernel Eigen-alignments of KRR: Reconstruction and Generalization

This paper investigates the critical role of eigenalignments between the kernel matrix and learning targets in achieving robust generalization in learning problems. We establish a direct connection between generalization performance in…

Machine Learning · Statistics 2026-05-18 Yang Liu , Ernest Fokoue , Richard Lange , Daniel Krutz

Robust Implicit Regularization via Weight Normalization

Overparameterized models may have many interpolating solutions; implicit regularization refers to the hidden preference of a particular optimization method towards a certain interpolating solution among the many. A by now established line…

Machine Learning · Computer Science 2024-09-18 Hung-Hsu Chou , Holger Rauhut , Rachel Ward

Eigenfunction Extraction for Ordered Representation Learning

Recent advances in representation learning reveal that widely used objectives, such as contrastive and non-contrastive, implicitly perform spectral decomposition of a contextual kernel, induced by the relationship between inputs and their…

Machine Learning · Computer Science 2025-10-29 Burak Varıcı , Che-Ping Tsai , Ritabrata Ray , Nicholas M. Boffi , Pradeep Ravikumar