Related papers: Double descent in quantum kernel methods

Double Descent in Quantum Kernel Ridge Regression

Various classical machine learning models, including linear regression, kernel methods, and deep neural networks, exhibit double descent, in which the test risk peaks near the interpolation threshold and then decreases in the…

Quantum Physics · Physics 2026-04-21 Kensuke Kamisoyama , Lento Nagano , Koji Terashi

Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle

Double descent is a surprising phenomenon in machine learning, in which as the number of model parameters grows relative to the number of data, test error drops as models grow ever larger into the highly overparameterized (data…

Machine Learning · Computer Science 2023-03-27 Rylan Schaeffer , Mikail Khona , Zachary Robertson , Akhilan Boopathy , Kateryna Pistunova , Jason W. Rocks , Ila Rani Fiete , Oluwasanmi Koyejo

The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization

Modern deep learning models employ considerably more parameters than required to fit the training data. Whereas conventional statistical wisdom suggests such models should drastically overfit, in practice these models generalize remarkably…

Machine Learning · Statistics 2020-08-18 Ben Adlam , Jeffrey Pennington

A U-turn on Double Descent: Rethinking Parameter Counting in Statistical Learning

Conventional statistical wisdom established a well-understood relationship between model complexity and prediction error, typically presented as a U-shaped curve reflecting a transition between under- and overfitting regimes. However,…

Machine Learning · Statistics 2023-10-31 Alicia Curth , Alan Jeffares , Mihaela van der Schaar

Multi-scale Feature Learning Dynamics: Insights for Double Descent

A key challenge in building theoretical foundations for deep learning is the complex optimization dynamics of neural networks, resulting from the high-dimensional interactions between the large number of network parameters. Such non-trivial…

Machine Learning · Computer Science 2021-12-07 Mohammad Pezeshki , Amartya Mitra , Yoshua Bengio , Guillaume Lajoie

Understanding the Double Descent Phenomenon in Deep Learning

Combining empirical risk minimization with capacity control is a classical strategy in machine learning when trying to control the generalization gap and avoid overfitting, as the model class capacity gets larger. Yet, in modern deep…

Machine Learning · Computer Science 2024-03-18 Marc Lafon , Alexandre Thomas

Double Descent of Discrepancy: A Task-, Data-, and Model-Agnostic Phenomenon

In this paper, we studied two identically-trained neural networks (i.e. networks with the same architecture, trained on the same dataset using the same algorithm, but with different initialization) and found that their outputs discrepancy…

Machine Learning · Computer Science 2023-05-26 Yifan Luo , Bin Dong

Mitigating deep double descent by concatenating inputs

The double descent curve is one of the most intriguing properties of deep neural networks. It contrasts the classical bias-variance curve with the behavior of modern neural networks, occurring where the number of samples nears the number of…

Machine Learning · Computer Science 2021-07-05 John Chen , Qihan Wang , Anastasios Kyrillidis

Understanding the double descent curve in Machine Learning

The theory of bias-variance used to serve as a guide for model selection when applying Machine Learning algorithms. However, modern practice has shown success with over-parameterized models that were expected to overfit but did not. This…

Machine Learning · Computer Science 2022-11-21 Luis Sa-Couto , Jose Miguel Ramos , Miguel Almeida , Andreas Wichert

Double Descent and Overparameterization in Particle Physics Data

Recently, the benefit of heavily overparameterized models has been observed in machine learning tasks: models with enough capacity to easily cross the \emph{interpolation threshold} improve in generalization error compared to the classical…

High Energy Physics - Experiment · Physics 2025-09-03 Matthias Vigl , Lukas Heinrich

Kernel Methods and Multi-layer Perceptrons Learn Linear Models in High Dimensions

Empirical observation of high dimensional phenomena, such as the double descent behaviour, has attracted a lot of interest in understanding classical techniques such as kernel methods, and their implications to explain generalization…

Machine Learning · Statistics 2022-01-21 Mojtaba Sahraee-Ardakan , Melikasadat Emami , Parthe Pandit , Sundeep Rangan , Alyson K. Fletcher

Consistency for Large Neural Networks: Regression and Classification

Although overparameterized models have achieved remarkable practical success, their theoretical properties, particularly their generalization behavior, remain incompletely understood. The well known double descents phenomenon suggests that…

Machine Learning · Statistics 2026-01-06 Haoran Zhan , Yingcun Xia

Unraveling the Enigma of Double Descent: An In-depth Analysis through the Lens of Learned Feature Space

Double descent presents a counter-intuitive aspect within the machine learning domain, and researchers have observed its manifestation in various models and tasks. While some theoretical explanations have been proposed for this phenomenon…

Machine Learning · Computer Science 2024-04-26 Yufei Gu , Xiaoqing Zheng , Tomaso Aste

Deep Double Descent: Where Bigger Models and More Data Hurt

We show that a variety of modern deep learning tasks exhibit a "double-descent" phenomenon where, as we increase model size, performance first gets worse and then gets better. Moreover, we show that double descent occurs not just as a…

Machine Learning · Computer Science 2019-12-06 Preetum Nakkiran , Gal Kaplun , Yamini Bansal , Tristan Yang , Boaz Barak , Ilya Sutskever

Phenomenology of Double Descent in Finite-Width Neural Networks

`Double descent' delineates the generalization behaviour of models depending on the regime they belong to: under- or over-parameterized. The current theoretical understanding behind the occurrence of this phenomenon is primarily based on…

Machine Learning · Statistics 2022-03-15 Sidak Pal Singh , Aurelien Lucchi , Thomas Hofmann , Bernhard Schölkopf

Double descent: When do neural quantum states generalize?

Neural quantum states (NQS) provide flexible and compact wavefunction parameterizations for numerical studies of quantum many-body physics. In particular, NQS aim to circumvent the exponential scaling of the Hilbert space by compressing…

Disordered Systems and Neural Networks · Physics 2026-03-17 M. Schuyler Moss , Alev Orfi , Christopher Roth , Anirvan M. Sengupta , Antoine Georges , Dries Sels , Anna Dawid , Agnes Valenti

On the Role of Optimization in Double Descent: A Least Squares Study

Empirically it has been observed that the performance of deep neural networks steadily improves as we increase model size, contradicting the classical view on overfitting and generalization. Recently, the double descent phenomena has been…

Machine Learning · Computer Science 2021-07-28 Ilja Kuzborskij , Csaba Szepesvári , Omar Rivasplata , Amal Rannen-Triki , Razvan Pascanu

Understanding the Role of Optimization in Double Descent

The phenomenon of model-wise double descent, where the test error peaks and then reduces as the model size increases, is an interesting topic that has attracted the attention of researchers due to the striking observed gap between theory…

Machine Learning · Computer Science 2023-12-08 Chris Yuhao Liu , Jeffrey Flanigan

The Double Descent Behavior in Two Layer Neural Network for Binary Classification

Recent studies observed a surprising concept on model test error called the double descent phenomenon, where the increasing model complexity decreases the test error first and then the error increases and decreases again. To observe this,…

Machine Learning · Statistics 2025-05-14 Chathurika S Abeykoon , Aleksandr Beknazaryan , Hailin Sang

Analysis of Interpolating Regression Models and the Double Descent Phenomenon

A regression model with more parameters than data points in the training data is overparametrized and has the capability to interpolate the training data. Based on the classical bias-variance tradeoff expressions, it is commonly assumed…

Machine Learning · Computer Science 2023-04-18 Tomas McKelvey