English
Related papers

Related papers: Manipulating Sparse Double Descent

200 papers

In energy-efficient schemes, finding the optimal size of deep learning models is very important and has a broad impact. Meanwhile, recent studies have reported an unexpected phenomenon, the sparse double descent: as the model's sparsity…

Artificial Intelligence · Computer Science 2023-09-01 Victor Quétu , Marta Milovanović

Double descent presents a counter-intuitive aspect within the machine learning domain, and researchers have observed its manifestation in various models and tasks. While some theoretical explanations have been proposed for this phenomenon…

Machine Learning · Computer Science 2024-04-26 Yufei Gu , Xiaoqing Zheng , Tomaso Aste

Finding the optimal size of deep learning models is very actual and of broad impact, especially in energy-saving schemes. Very recently, an unexpected phenomenon, the ``double descent'', has caught the attention of the deep learning…

Machine Learning · Computer Science 2023-12-27 Victor Quétu , Enzo Tartaglione

Combining empirical risk minimization with capacity control is a classical strategy in machine learning when trying to control the generalization gap and avoid overfitting, as the model class capacity gets larger. Yet, in modern deep…

Machine Learning · Computer Science 2024-03-18 Marc Lafon , Alexandre Thomas

People usually believe that network pruning not only reduces the computational cost of deep networks, but also prevents overfitting by decreasing model capacity. However, our work surprisingly discovers that network pruning sometimes even…

Machine Learning · Computer Science 2022-06-20 Zheng He , Zeke Xie , Quanzhi Zhu , Zengchang Qin

Recent studies observed a surprising concept on model test error called the double descent phenomenon, where the increasing model complexity decreases the test error first and then the error increases and decreases again. To observe this,…

Machine Learning · Statistics 2025-05-14 Chathurika S Abeykoon , Aleksandr Beknazaryan , Hailin Sang

A key challenge in building theoretical foundations for deep learning is the complex optimization dynamics of neural networks, resulting from the high-dimensional interactions between the large number of network parameters. Such non-trivial…

Machine Learning · Computer Science 2021-12-07 Mohammad Pezeshki , Amartya Mitra , Yoshua Bengio , Guillaume Lajoie

The risk of overparameterized models, in particular deep neural networks, is often double-descent shaped as a function of the model size. Recently, it was shown that the risk as a function of the early-stopping time can also be…

Machine Learning · Computer Science 2022-06-06 Fatih Furkan Yilmaz , Reinhard Heckel

Double descent is a surprising phenomenon in machine learning, in which as the number of model parameters grows relative to the number of data, test error drops as models grow ever larger into the highly overparameterized (data…

The double descent curve is one of the most intriguing properties of deep neural networks. It contrasts the classical bias-variance curve with the behavior of modern neural networks, occurring where the number of samples nears the number of…

Machine Learning · Computer Science 2021-07-05 John Chen , Qihan Wang , Anastasios Kyrillidis

Empirically it has been observed that the performance of deep neural networks steadily improves as we increase model size, contradicting the classical view on overfitting and generalization. Recently, the double descent phenomena has been…

Machine Learning · Computer Science 2021-07-28 Ilja Kuzborskij , Csaba Szepesvári , Omar Rivasplata , Amal Rannen-Triki , Razvan Pascanu

We show that a variety of modern deep learning tasks exhibit a "double-descent" phenomenon where, as we increase model size, performance first gets worse and then gets better. Moreover, we show that double descent occurs not just as a…

Machine Learning · Computer Science 2019-12-06 Preetum Nakkiran , Gal Kaplun , Yamini Bansal , Tristan Yang , Boaz Barak , Ilya Sutskever

`Double descent' delineates the generalization behaviour of models depending on the regime they belong to: under- or over-parameterized. The current theoretical understanding behind the occurrence of this phenomenon is primarily based on…

Machine Learning · Statistics 2022-03-15 Sidak Pal Singh , Aurelien Lucchi , Thomas Hofmann , Bernhard Schölkopf

Double descent is a phenomenon of over-parameterized statistical models such as deep neural networks which have a re-descending property in their risk function. As the complexity of the model increases, risk exhibits a U-shaped region due…

Machine Learning · Statistics 2025-10-16 Nick Polson , Vadim Sokolov

Epoch-wise double descent is the phenomenon where generalisation performance improves beyond the point of overfitting, resulting in a generalisation curve exhibiting two descents under the course of learning. Understanding the mechanisms…

Machine Learning · Statistics 2024-09-20 Amanda Olmin , Fredrik Lindsten

Conventional statistical wisdom established a well-understood relationship between model complexity and prediction error, typically presented as a U-shaped curve reflecting a transition between under- and overfitting regimes. However,…

Machine Learning · Statistics 2023-10-31 Alicia Curth , Alan Jeffares , Mihaela van der Schaar

Although overparameterized models have achieved remarkable practical success, their theoretical properties, particularly their generalization behavior, remain incompletely understood. The well known double descents phenomenon suggests that…

Machine Learning · Statistics 2026-01-06 Haoran Zhan , Yingcun Xia

Neoteric works have shown that modern deep learning models can exhibit a sparse double descent phenomenon. Indeed, as the sparsity of the model increases, the test performance first worsens since the model is overfitting the training data;…

Machine Learning · Computer Science 2024-02-09 Victor Quétu , Enzo Tartaglione

In this paper, we studied two identically-trained neural networks (i.e. networks with the same architecture, trained on the same dataset using the same algorithm, but with different initialization) and found that their outputs discrepancy…

Machine Learning · Computer Science 2023-05-26 Yifan Luo , Bin Dong

The role of $L^2$ regularization, in the specific case of deep neural networks rather than more traditional machine learning models, is still not fully elucidated. We hypothesize that this complex interplay is due to the combination of…

Machine Learning · Computer Science 2019-02-11 Pierre H. Richemond , Yike Guo
‹ Prev 1 2 3 10 Next ›