Related papers: Optimal Regularization Can Mitigate Double Descent

Can we avoid Double Descent in Deep Neural Networks?

Finding the optimal size of deep learning models is very actual and of broad impact, especially in energy-saving schemes. Very recently, an unexpected phenomenon, the ``double descent'', has caught the attention of the deep learning…

Machine Learning · Computer Science 2023-12-27 Victor Quétu , Enzo Tartaglione

The Quest of Finding the Antidote to Sparse Double Descent

In energy-efficient schemes, finding the optimal size of deep learning models is very important and has a broad impact. Meanwhile, recent studies have reported an unexpected phenomenon, the sparse double descent: as the model's sparsity…

Artificial Intelligence · Computer Science 2023-09-01 Victor Quétu , Marta Milovanović

Consistency for Large Neural Networks: Regression and Classification

Although overparameterized models have achieved remarkable practical success, their theoretical properties, particularly their generalization behavior, remain incompletely understood. The well known double descents phenomenon suggests that…

Machine Learning · Statistics 2026-01-06 Haoran Zhan , Yingcun Xia

Dropout Drops Double Descent

This study demonstrates that double descent can be mitigated by adding a dropout layer adjacent to the fully connected linear layer. The unexpected double-descent phenomenon garnered substantial attention in recent years, resulting in…

Machine Learning · Computer Science 2025-08-08 Tian-Le Yang , Joe Suzuki

Regularization-wise double descent: Why it occurs and how to eliminate it

The risk of overparameterized models, in particular deep neural networks, is often double-descent shaped as a function of the model size. Recently, it was shown that the risk as a function of the early-stopping time can also be…

Machine Learning · Computer Science 2022-06-06 Fatih Furkan Yilmaz , Reinhard Heckel

On the Role of Optimization in Double Descent: A Least Squares Study

Empirically it has been observed that the performance of deep neural networks steadily improves as we increase model size, contradicting the classical view on overfitting and generalization. Recently, the double descent phenomena has been…

Machine Learning · Computer Science 2021-07-28 Ilja Kuzborskij , Csaba Szepesvári , Omar Rivasplata , Amal Rannen-Triki , Razvan Pascanu

Optimal L2 Regularization in High-dimensional Continual Linear Regression

We study generalization in an overparameterized continual linear regression setting, where a model is trained with L2 (isotropic) regularization across a sequence of tasks. We derive a closed-form expression for the expected generalization…

Machine Learning · Computer Science 2026-04-14 Gilad Karpel , Edward Moroshko , Ran Levinstein , Ron Meir , Daniel Soudry , Itay Evron

Understanding the Role of Optimization in Double Descent

The phenomenon of model-wise double descent, where the test error peaks and then reduces as the model size increases, is an interesting topic that has attracted the attention of researchers due to the striking observed gap between theory…

Machine Learning · Computer Science 2023-12-08 Chris Yuhao Liu , Jeffrey Flanigan

Regularization Matters: A Nonparametric Perspective on Overparametrized Neural Network

Overparametrized neural networks trained by gradient descent (GD) can provably overfit any training data. However, the generalization guarantee may not hold for noisy data. From a nonparametric perspective, this paper studies how well…

Machine Learning · Statistics 2021-09-28 Tianyang Hu , Wenjia Wang , Cong Lin , Guang Cheng

Squared $\ell_2$ Norm as Consistency Loss for Leveraging Augmented Data to Learn Robust and Invariant Representations

Data augmentation is one of the most popular techniques for improving the robustness of neural networks. In addition to directly training the model with original samples and augmented samples, a torrent of methods regularizing the distance…

Machine Learning · Computer Science 2020-11-30 Haohan Wang , Zeyi Huang , Xindi Wu , Eric P. Xing

Path Regularization: A Near-Complete and Optimal Nonasymptotic Generalization Theory for Multilayer Neural Networks and Double Descent Phenomenon

Path regularization has shown to be a very effective regularization to train neural networks, leading to a better generalization property than common regularizations i.e. weight decay, etc. We propose a first near-complete (as will be made…

Machine Learning · Computer Science 2026-04-09 Hao Yu

Double Descent Optimization Pattern and Aliasing: Caveats of Noisy Labels

Optimization plays a key role in the training of deep neural networks. Deciding when to stop training can have a substantial impact on the performance of the network during inference. Under certain conditions, the generalization error can…

Machine Learning · Computer Science 2021-09-20 Florian Dubost , Erin Hong , Max Pike , Siddharth Sharma , Siyi Tang , Nandita Bhaskhar , Christopher Lee-Messer , Daniel Rubin

Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel

Recent works have shown that on sufficiently over-parametrized neural nets, gradient descent with relatively large initialization optimizes a prediction function in the RKHS of the Neural Tangent Kernel (NTK). This analysis leads to global…

Machine Learning · Statistics 2020-04-28 Colin Wei , Jason D. Lee , Qiang Liu , Tengyu Ma

A Statistical Theory of Regularization-Based Continual Learning

We provide a statistical analysis of regularization-based continual learning on a sequence of linear regression tasks, with emphasis on how different regularization terms affect the model performance. We first derive the convergence rate…

Machine Learning · Computer Science 2024-06-11 Xuyang Zhao , Huiyuan Wang , Weiran Huang , Wei Lin

Investigating the Impact of Model Width and Density on Generalization in Presence of Label Noise

Increasing the size of overparameterized neural networks has been a key in achieving state-of-the-art performance. This is captured by the double descent phenomenon, where the test loss follows a decreasing-increasing-decreasing pattern (or…

Machine Learning · Computer Science 2024-05-09 Yihao Xue , Kyle Whitecross , Baharan Mirzasoleiman

The Double Descent Behavior in Two Layer Neural Network for Binary Classification

Recent studies observed a surprising concept on model test error called the double descent phenomenon, where the increasing model complexity decreases the test error first and then the error increases and decreases again. To observe this,…

Machine Learning · Statistics 2025-05-14 Chathurika S Abeykoon , Aleksandr Beknazaryan , Hailin Sang

Understanding the Double Descent Phenomenon in Deep Learning

Combining empirical risk minimization with capacity control is a classical strategy in machine learning when trying to control the generalization gap and avoid overfitting, as the model class capacity gets larger. Yet, in modern deep…

Machine Learning · Computer Science 2024-03-18 Marc Lafon , Alexandre Thomas

Manipulating Sparse Double Descent

This paper investigates the double descent phenomenon in two-layer neural networks, focusing on the role of L1 regularization and representation dimensions. It explores an alternative double descent phenomenon, named sparse double descent.…

Machine Learning · Computer Science 2024-01-22 Ya Shi Zhang

Multi-scale Feature Learning Dynamics: Insights for Double Descent

A key challenge in building theoretical foundations for deep learning is the complex optimization dynamics of neural networks, resulting from the high-dimensional interactions between the large number of network parameters. Such non-trivial…

Machine Learning · Computer Science 2021-12-07 Mohammad Pezeshki , Amartya Mitra , Yoshua Bengio , Guillaume Lajoie

Linear Convergence of Gradient Descent for Quadratically Regularized Optimal Transport

In optimal transport, quadratic regularization is an alternative to entropic regularization when sparse couplings or small regularization parameters are desired. Quadratic regularization penalizes transport couplings by the squared $L^2$…

Optimization and Control · Mathematics 2026-05-20 Alberto González-Sanz , Marcel Nutz , Andrés Riveros Valdevenito