English
Related papers

Related papers: Optimal Regularization Can Mitigate Double Descent

200 papers

Finding the optimal size of deep learning models is very actual and of broad impact, especially in energy-saving schemes. Very recently, an unexpected phenomenon, the ``double descent'', has caught the attention of the deep learning…

Machine Learning · Computer Science 2023-12-27 Victor Quétu , Enzo Tartaglione

In energy-efficient schemes, finding the optimal size of deep learning models is very important and has a broad impact. Meanwhile, recent studies have reported an unexpected phenomenon, the sparse double descent: as the model's sparsity…

Artificial Intelligence · Computer Science 2023-09-01 Victor Quétu , Marta Milovanović

Although overparameterized models have achieved remarkable practical success, their theoretical properties, particularly their generalization behavior, remain incompletely understood. The well known double descents phenomenon suggests that…

Machine Learning · Statistics 2026-01-06 Haoran Zhan , Yingcun Xia

This study demonstrates that double descent can be mitigated by adding a dropout layer adjacent to the fully connected linear layer. The unexpected double-descent phenomenon garnered substantial attention in recent years, resulting in…

Machine Learning · Computer Science 2025-08-08 Tian-Le Yang , Joe Suzuki

The risk of overparameterized models, in particular deep neural networks, is often double-descent shaped as a function of the model size. Recently, it was shown that the risk as a function of the early-stopping time can also be…

Machine Learning · Computer Science 2022-06-06 Fatih Furkan Yilmaz , Reinhard Heckel

Empirically it has been observed that the performance of deep neural networks steadily improves as we increase model size, contradicting the classical view on overfitting and generalization. Recently, the double descent phenomena has been…

Machine Learning · Computer Science 2021-07-28 Ilja Kuzborskij , Csaba Szepesvári , Omar Rivasplata , Amal Rannen-Triki , Razvan Pascanu

We study generalization in an overparameterized continual linear regression setting, where a model is trained with L2 (isotropic) regularization across a sequence of tasks. We derive a closed-form expression for the expected generalization…

Machine Learning · Computer Science 2026-04-14 Gilad Karpel , Edward Moroshko , Ran Levinstein , Ron Meir , Daniel Soudry , Itay Evron

The phenomenon of model-wise double descent, where the test error peaks and then reduces as the model size increases, is an interesting topic that has attracted the attention of researchers due to the striking observed gap between theory…

Machine Learning · Computer Science 2023-12-08 Chris Yuhao Liu , Jeffrey Flanigan

Overparametrized neural networks trained by gradient descent (GD) can provably overfit any training data. However, the generalization guarantee may not hold for noisy data. From a nonparametric perspective, this paper studies how well…

Machine Learning · Statistics 2021-09-28 Tianyang Hu , Wenjia Wang , Cong Lin , Guang Cheng

Data augmentation is one of the most popular techniques for improving the robustness of neural networks. In addition to directly training the model with original samples and augmented samples, a torrent of methods regularizing the distance…

Machine Learning · Computer Science 2020-11-30 Haohan Wang , Zeyi Huang , Xindi Wu , Eric P. Xing

Path regularization has shown to be a very effective regularization to train neural networks, leading to a better generalization property than common regularizations i.e. weight decay, etc. We propose a first near-complete (as will be made…

Machine Learning · Computer Science 2026-04-09 Hao Yu

Optimization plays a key role in the training of deep neural networks. Deciding when to stop training can have a substantial impact on the performance of the network during inference. Under certain conditions, the generalization error can…

Machine Learning · Computer Science 2021-09-20 Florian Dubost , Erin Hong , Max Pike , Siddharth Sharma , Siyi Tang , Nandita Bhaskhar , Christopher Lee-Messer , Daniel Rubin

Recent works have shown that on sufficiently over-parametrized neural nets, gradient descent with relatively large initialization optimizes a prediction function in the RKHS of the Neural Tangent Kernel (NTK). This analysis leads to global…

Machine Learning · Statistics 2020-04-28 Colin Wei , Jason D. Lee , Qiang Liu , Tengyu Ma

We provide a statistical analysis of regularization-based continual learning on a sequence of linear regression tasks, with emphasis on how different regularization terms affect the model performance. We first derive the convergence rate…

Machine Learning · Computer Science 2024-06-11 Xuyang Zhao , Huiyuan Wang , Weiran Huang , Wei Lin

Increasing the size of overparameterized neural networks has been a key in achieving state-of-the-art performance. This is captured by the double descent phenomenon, where the test loss follows a decreasing-increasing-decreasing pattern (or…

Machine Learning · Computer Science 2024-05-09 Yihao Xue , Kyle Whitecross , Baharan Mirzasoleiman

Recent studies observed a surprising concept on model test error called the double descent phenomenon, where the increasing model complexity decreases the test error first and then the error increases and decreases again. To observe this,…

Machine Learning · Statistics 2025-05-14 Chathurika S Abeykoon , Aleksandr Beknazaryan , Hailin Sang

Combining empirical risk minimization with capacity control is a classical strategy in machine learning when trying to control the generalization gap and avoid overfitting, as the model class capacity gets larger. Yet, in modern deep…

Machine Learning · Computer Science 2024-03-18 Marc Lafon , Alexandre Thomas

This paper investigates the double descent phenomenon in two-layer neural networks, focusing on the role of L1 regularization and representation dimensions. It explores an alternative double descent phenomenon, named sparse double descent.…

Machine Learning · Computer Science 2024-01-22 Ya Shi Zhang

A key challenge in building theoretical foundations for deep learning is the complex optimization dynamics of neural networks, resulting from the high-dimensional interactions between the large number of network parameters. Such non-trivial…

Machine Learning · Computer Science 2021-12-07 Mohammad Pezeshki , Amartya Mitra , Yoshua Bengio , Guillaume Lajoie

In optimal transport, quadratic regularization is an alternative to entropic regularization when sparse couplings or small regularization parameters are desired. Quadratic regularization penalizes transport couplings by the squared $L^2$…

Optimization and Control · Mathematics 2026-05-20 Alberto González-Sanz , Marcel Nutz , Andrés Riveros Valdevenito
‹ Prev 1 2 3 10 Next ›