Related papers: Manipulating Sparse Double Descent

The Quest of Finding the Antidote to Sparse Double Descent

In energy-efficient schemes, finding the optimal size of deep learning models is very important and has a broad impact. Meanwhile, recent studies have reported an unexpected phenomenon, the sparse double descent: as the model's sparsity…

Artificial Intelligence · Computer Science 2023-09-01 Victor Quétu , Marta Milovanović

Unraveling the Enigma of Double Descent: An In-depth Analysis through the Lens of Learned Feature Space

Double descent presents a counter-intuitive aspect within the machine learning domain, and researchers have observed its manifestation in various models and tasks. While some theoretical explanations have been proposed for this phenomenon…

Machine Learning · Computer Science 2024-04-26 Yufei Gu , Xiaoqing Zheng , Tomaso Aste

Can we avoid Double Descent in Deep Neural Networks?

Finding the optimal size of deep learning models is very actual and of broad impact, especially in energy-saving schemes. Very recently, an unexpected phenomenon, the ``double descent'', has caught the attention of the deep learning…

Machine Learning · Computer Science 2023-12-27 Victor Quétu , Enzo Tartaglione

Understanding the Double Descent Phenomenon in Deep Learning

Combining empirical risk minimization with capacity control is a classical strategy in machine learning when trying to control the generalization gap and avoid overfitting, as the model class capacity gets larger. Yet, in modern deep…

Machine Learning · Computer Science 2024-03-18 Marc Lafon , Alexandre Thomas

Sparse Double Descent: Where Network Pruning Aggravates Overfitting

People usually believe that network pruning not only reduces the computational cost of deep networks, but also prevents overfitting by decreasing model capacity. However, our work surprisingly discovers that network pruning sometimes even…

Machine Learning · Computer Science 2022-06-20 Zheng He , Zeke Xie , Quanzhi Zhu , Zengchang Qin

The Double Descent Behavior in Two Layer Neural Network for Binary Classification

Recent studies observed a surprising concept on model test error called the double descent phenomenon, where the increasing model complexity decreases the test error first and then the error increases and decreases again. To observe this,…

Machine Learning · Statistics 2025-05-14 Chathurika S Abeykoon , Aleksandr Beknazaryan , Hailin Sang

Multi-scale Feature Learning Dynamics: Insights for Double Descent

A key challenge in building theoretical foundations for deep learning is the complex optimization dynamics of neural networks, resulting from the high-dimensional interactions between the large number of network parameters. Such non-trivial…

Machine Learning · Computer Science 2021-12-07 Mohammad Pezeshki , Amartya Mitra , Yoshua Bengio , Guillaume Lajoie

Regularization-wise double descent: Why it occurs and how to eliminate it

The risk of overparameterized models, in particular deep neural networks, is often double-descent shaped as a function of the model size. Recently, it was shown that the risk as a function of the early-stopping time can also be…

Machine Learning · Computer Science 2022-06-06 Fatih Furkan Yilmaz , Reinhard Heckel

Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle

Double descent is a surprising phenomenon in machine learning, in which as the number of model parameters grows relative to the number of data, test error drops as models grow ever larger into the highly overparameterized (data…

Machine Learning · Computer Science 2023-03-27 Rylan Schaeffer , Mikail Khona , Zachary Robertson , Akhilan Boopathy , Kateryna Pistunova , Jason W. Rocks , Ila Rani Fiete , Oluwasanmi Koyejo

Mitigating deep double descent by concatenating inputs

The double descent curve is one of the most intriguing properties of deep neural networks. It contrasts the classical bias-variance curve with the behavior of modern neural networks, occurring where the number of samples nears the number of…

Machine Learning · Computer Science 2021-07-05 John Chen , Qihan Wang , Anastasios Kyrillidis

On the Role of Optimization in Double Descent: A Least Squares Study

Empirically it has been observed that the performance of deep neural networks steadily improves as we increase model size, contradicting the classical view on overfitting and generalization. Recently, the double descent phenomena has been…

Machine Learning · Computer Science 2021-07-28 Ilja Kuzborskij , Csaba Szepesvári , Omar Rivasplata , Amal Rannen-Triki , Razvan Pascanu

Deep Double Descent: Where Bigger Models and More Data Hurt

We show that a variety of modern deep learning tasks exhibit a "double-descent" phenomenon where, as we increase model size, performance first gets worse and then gets better. Moreover, we show that double descent occurs not just as a…

Machine Learning · Computer Science 2019-12-06 Preetum Nakkiran , Gal Kaplun , Yamini Bansal , Tristan Yang , Boaz Barak , Ilya Sutskever

Phenomenology of Double Descent in Finite-Width Neural Networks

`Double descent' delineates the generalization behaviour of models depending on the regime they belong to: under- or over-parameterized. The current theoretical understanding behind the occurrence of this phenomenon is primarily based on…

Machine Learning · Statistics 2022-03-15 Sidak Pal Singh , Aurelien Lucchi , Thomas Hofmann , Bernhard Schölkopf

Bayesian Double Descent

Double descent is a phenomenon of over-parameterized statistical models such as deep neural networks which have a re-descending property in their risk function. As the complexity of the model increases, risk exhibits a U-shaped region due…

Machine Learning · Statistics 2025-10-16 Nick Polson , Vadim Sokolov

Towards Understanding Epoch-wise Double descent in Two-layer Linear Neural Networks

Epoch-wise double descent is the phenomenon where generalisation performance improves beyond the point of overfitting, resulting in a generalisation curve exhibiting two descents under the course of learning. Understanding the mechanisms…

Machine Learning · Statistics 2024-09-20 Amanda Olmin , Fredrik Lindsten

A U-turn on Double Descent: Rethinking Parameter Counting in Statistical Learning

Conventional statistical wisdom established a well-understood relationship between model complexity and prediction error, typically presented as a U-shaped curve reflecting a transition between under- and overfitting regimes. However,…

Machine Learning · Statistics 2023-10-31 Alicia Curth , Alan Jeffares , Mihaela van der Schaar

Consistency for Large Neural Networks: Regression and Classification

Although overparameterized models have achieved remarkable practical success, their theoretical properties, particularly their generalization behavior, remain incompletely understood. The well known double descents phenomenon suggests that…

Machine Learning · Statistics 2026-01-06 Haoran Zhan , Yingcun Xia

DSD$^2$: Can We Dodge Sparse Double Descent and Compress the Neural Network Worry-Free?

Neoteric works have shown that modern deep learning models can exhibit a sparse double descent phenomenon. Indeed, as the sparsity of the model increases, the test performance first worsens since the model is overfitting the training data;…

Machine Learning · Computer Science 2024-02-09 Victor Quétu , Enzo Tartaglione

Double Descent of Discrepancy: A Task-, Data-, and Model-Agnostic Phenomenon

In this paper, we studied two identically-trained neural networks (i.e. networks with the same architecture, trained on the same dataset using the same algorithm, but with different initialization) and found that their outputs discrepancy…

Machine Learning · Computer Science 2023-05-26 Yifan Luo , Bin Dong

Combining learning rate decay and weight decay with complexity gradient descent - Part I

The role of $L^2$ regularization, in the specific case of deep neural networks rather than more traditional machine learning models, is still not fully elucidated. We hypothesize that this complex interplay is due to the combination of…

Machine Learning · Computer Science 2019-02-11 Pierre H. Richemond , Yike Guo