English
Related papers

Related papers: Improving Adaptivity via Over-Parameterization in …

200 papers

Over-parameterization and adaptive methods have played a crucial role in the success of deep learning in the last decade. The widespread use of over-parameterization has forced us to rethink generalization by bringing forth new phenomena,…

Machine Learning · Statistics 2020-12-01 Vatsal Shah , Soumya Basu , Anastasios Kyrillidis , Sujay Sanghavi

This paper introduces a diagonal adaptive kernel model that dynamically learns kernel eigenvalues and output coefficients simultaneously during training. Unlike fixed-kernel methods tied to the neural tangent kernel theory, the diagonal…

Machine Learning · Computer Science 2025-01-16 Yicheng Li , Qian Lin

In deep learning it is common to overparameterize neural networks, that is, to use more parameters than training samples. Quite surprisingly training the neural network via (stochastic) gradient descent leads to models that generalize very…

Optimization and Control · Mathematics 2025-01-30 Hung-Hsu Chou , Johannes Maly , Holger Rauhut

Recent research in neural networks and machine learning suggests that using many more parameters than strictly required by the initial complexity of a regression problem can result in more accurate or faster-converging models -- contrary to…

Machine Learning · Computer Science 2023-05-18 Arthur Castello B. de Oliveira , Milad Siami , Eduardo D. Sontag

It is well understood that neural networks with carefully hand-picked weights provide powerful function approximation and that they can be successfully trained in over-parametrized regimes. Since over-parametrization ensures zero training…

Machine Learning · Computer Science 2024-05-21 G. Welper

A recent line of research has shown that gradient-based algorithms with random initialization can converge to the global minima of the training loss for over-parameterized (i.e., sufficiently wide) deep neural networks. However, the…

Machine Learning · Computer Science 2019-06-12 Difan Zou , Quanquan Gu

Generalization beyond a training dataset is a main goal of machine learning, but theoretical understanding of generalization remains an open problem for many models. The need for a new theory is exacerbated by recent observations in deep…

Machine Learning · Statistics 2022-02-08 Abdulkadir Canatar , Blake Bordelon , Cengiz Pehlevan

This paper proposes a novel parameter selection strategy for kernel-based gradient descent (KGD) algorithms, integrating bias-variance analysis with the splitting method. We introduce the concept of empirical effective dimension to quantify…

Machine Learning · Statistics 2026-03-05 Xiaotong Liu , Yunwen Lei , Xiangyu Chang , Shao-Bo Lin

Any applied mathematical model contains parameters. The paper proposes to use kernel learning for the parametric analysis of the model. The approach consists in setting a distribution on the parameter space, obtaining a finite training…

Optimization and Control · Mathematics 2025-01-27 Vladimir Norkin , Alois Pichler

It is by now well-established that modern over-parameterized models seem to elude the bias-variance tradeoff and generalize well despite overfitting noise. Many recent works attempt to analyze this phenomenon in the relatively tractable…

Machine Learning · Computer Science 2024-02-21 Daniel Barzilai , Ohad Shamir

Despite classical statistical theory predicting severe overfitting, modern massively overparameterized neural networks still generalize well. This unexpected property is attributed to the network's so-called implicit bias, which describes…

Machine Learning · Computer Science 2025-03-14 Justin Sahs , Ryan Pyle , Fabio Anselmi , Ankit Patel

Modern deep learning models employ considerably more parameters than required to fit the training data. Whereas conventional statistical wisdom suggests such models should drastically overfit, in practice these models generalize remarkably…

Machine Learning · Statistics 2020-08-18 Ben Adlam , Jeffrey Pennington

A remarkable recent discovery in machine learning has been that deep neural networks can achieve impressive performance (in terms of both lower training error and higher generalization capacity) in the regime where they are massively…

Machine Learning · Computer Science 2020-03-03 Thanh V. Nguyen , Raymond K. W. Wong , Chinmay Hegde

The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite…

Statistics Theory · Mathematics 2021-03-17 Peter L. Bartlett , Andrea Montanari , Alexander Rakhlin

In supervised learning, it is known that overparameterized neural networks with one hidden layer provably and efficiently learn and generalize, when trained using stochastic gradient descent with a sufficiently small learning rate and…

Machine Learning · Computer Science 2022-03-24 Kulin Shah , Amit Deshpande , Navin Goyal

Modern machine learning models are often trained in a setting where the number of parameters exceeds the number of training samples. To understand the implicit bias of gradient descent in such overparameterized models, prior work has…

Machine Learning · Statistics 2025-10-29 Hannes Matt , Dominik Stöger

Neural networks trained via gradient descent with random initialization and without any regularization enjoy good generalization performance in practice despite being highly overparametrized. A promising direction to explain this phenomenon…

Machine Learning · Computer Science 2022-05-17 Hancheng Min , Salma Tarmoun , Rene Vidal , Enrique Mallada

This paper investigates the critical role of eigenalignments between the kernel matrix and learning targets in achieving robust generalization in learning problems. We establish a direct connection between generalization performance in…

Machine Learning · Statistics 2026-05-18 Yang Liu , Ernest Fokoue , Richard Lange , Daniel Krutz

Overparameterized models may have many interpolating solutions; implicit regularization refers to the hidden preference of a particular optimization method towards a certain interpolating solution among the many. A by now established line…

Machine Learning · Computer Science 2024-09-18 Hung-Hsu Chou , Holger Rauhut , Rachel Ward

Recent advances in representation learning reveal that widely used objectives, such as contrastive and non-contrastive, implicitly perform spectral decomposition of a contextual kernel, induced by the relationship between inputs and their…

Machine Learning · Computer Science 2025-10-29 Burak Varıcı , Che-Ping Tsai , Ritabrata Ray , Nicholas M. Boffi , Pradeep Ravikumar
‹ Prev 1 2 3 10 Next ›