Related papers: On the interplay between data structure and loss f…

Double descent for least-squares interpolation on contaminated data: A simulation study

Overparametrized models can exhibit an excellent generalization performance, although they should be prone to overfitting according to classical statistical theory. The discovery of the "double descent", indicating that the generalization…

Machine Learning · Computer Science 2026-05-22 Tino Werner

Generalisation error in learning with random features and the hidden manifold model

We study generalised linear regression and classification for a synthetically generated dataset encompassing different problems of interest, such as learning with random features, neural networks in the lazy training regime, and the hidden…

Statistics Theory · Mathematics 2022-03-28 Federica Gerace , Bruno Loureiro , Florent Krzakala , Marc Mézard , Lenka Zdeborová

Consistency for Large Neural Networks: Regression and Classification

Although overparameterized models have achieved remarkable practical success, their theoretical properties, particularly their generalization behavior, remain incompletely understood. The well known double descents phenomenon suggests that…

Machine Learning · Statistics 2026-01-06 Haoran Zhan , Yingcun Xia

The Impact of Anisotropic Covariance Structure on the Training Dynamics and Generalization Error of Linear Networks

The success of deep neural networks largely depends on the statistical structure of the training data. While learning dynamics and generalization on isotropic data are well-established, the impact of pronounced anisotropy on these crucial…

Machine Learning · Statistics 2026-01-13 Taishi Watanabe , Ryo Karakida , Jun-nosuke Teramae

Learning Curves for SGD on Structured Features

The generalization performance of a machine learning algorithm such as a neural network depends in a non-trivial way on the structure of the data distribution. To analyze the influence of data structure on test loss dynamics, we study an…

Machine Learning · Statistics 2022-03-16 Blake Bordelon , Cengiz Pehlevan

Learning through atypical "phase transitions" in overparameterized neural networks

Current deep neural networks are highly overparameterized (up to billions of connection weights) and nonlinear. Yet they can fit data almost perfectly through variants of gradient descent algorithms and achieve unexpected levels of…

Machine Learning · Computer Science 2022-07-27 Carlo Baldassi , Clarissa Lauditi , Enrico M. Malatesta , Rosalba Pacelli , Gabriele Perugini , Riccardo Zecchina

More Data Can Hurt for Linear Regression: Sample-wise Double Descent

In this expository note we describe a surprising phenomenon in overparameterized linear regression, where the dimension exceeds the number of samples: there is a regime where the test risk of the estimator found by gradient descent…

Machine Learning · Statistics 2019-12-17 Preetum Nakkiran

Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle

Double descent is a surprising phenomenon in machine learning, in which as the number of model parameters grows relative to the number of data, test error drops as models grow ever larger into the highly overparameterized (data…

Machine Learning · Computer Science 2023-03-27 Rylan Schaeffer , Mikail Khona , Zachary Robertson , Akhilan Boopathy , Kateryna Pistunova , Jason W. Rocks , Ila Rani Fiete , Oluwasanmi Koyejo

The curse of overparametrization in adversarial training: Precise analysis of robust generalization for random features regression

Successful deep learning models often involve training neural network architectures that contain more parameters than the number of training samples. Such overparametrized models have been extensively studied in recent years, and the…

Machine Learning · Computer Science 2024-02-02 Hamed Hassani , Adel Javanmard

Subspace Fitting Meets Regression: The Effects of Supervision and Orthonormality Constraints on Double Descent of Generalization Errors

We study the linear subspace fitting problem in the overparameterized setting, where the estimated subspace can perfectly interpolate the training examples. Our scope includes the least-squares solutions to subspace fitting tasks with…

Machine Learning · Computer Science 2020-08-21 Yehuda Dar , Paul Mayer , Lorenzo Luzi , Richard G. Baraniuk

Learning curves for deep structured Gaussian feature models

In recent years, significant attention in deep learning theory has been devoted to analyzing when models that interpolate their training data can still generalize well to unseen examples. Many insights have been gained from studying models…

Machine Learning · Statistics 2023-10-24 Jacob A. Zavatone-Veth , Cengiz Pehlevan

Double Double Descent: On Generalization Errors in Transfer Learning between Linear Regression Tasks

We study the transfer learning process between two linear regression problems. An important and timely special case is when the regressors are overparameterized and perfectly interpolate their training data. We examine a parameter transfer…

Machine Learning · Computer Science 2022-09-29 Yehuda Dar , Richard G. Baraniuk

Frozen Overparameterization: A Double Descent Perspective on Transfer Learning of Deep Neural Networks

We study the generalization behavior of transfer learning of deep neural networks (DNNs). We adopt the overparameterization perspective -- featuring interpolation of the training data (i.e., approximately zero train error) and the double…

Machine Learning · Computer Science 2023-06-13 Yehuda Dar , Lorenzo Luzi , Richard G. Baraniuk

Classification vs regression in overparameterized regimes: Does the loss function matter?

We compare classification and regression tasks in an overparameterized linear model with Gaussian features. On the one hand, we show that with sufficient overparameterization all training points are support vectors: solutions obtained by…

Machine Learning · Computer Science 2021-10-15 Vidya Muthukumar , Adhyyan Narang , Vignesh Subramanian , Mikhail Belkin , Daniel Hsu , Anant Sahai

Evaluating the Impact of Loss Function Variation in Deep Learning for Classification

The loss function is arguably among the most important hyperparameters for a neural network. Many loss functions have been designed to date, making a correct choice nontrivial. However, elaborate justifications regarding the choice of the…

Machine Learning · Computer Science 2022-10-31 Simon Dräger , Jannik Dunkelau

Double Trouble in Double Descent : Bias and Variance(s) in the Lazy Regime

Deep neural networks can achieve remarkable generalization performances while interpolating the training data perfectly. Rather than the U-curve emblematic of the bias-variance trade-off, their test error often follows a "double descent" -…

Machine Learning · Computer Science 2020-04-06 Stéphane d'Ascoli , Maria Refinetti , Giulio Biroli , Florent Krzakala

Efficient Compression of Overparameterized Deep Models through Low-Dimensional Learning Dynamics

Overparameterized models have proven to be powerful tools for solving various machine learning tasks. However, overparameterization often leads to a substantial increase in computational and memory costs, which in turn requires extensive…

Machine Learning · Computer Science 2024-03-13 Soo Min Kwon , Zekai Zhang , Dogyoon Song , Laura Balzano , Qing Qu

Analysis of Overparameterization in Continual Learning under a Linear Model

Autonomous machine learning systems that learn many tasks in sequence are prone to the catastrophic forgetting problem. Mathematical theory is needed in order to understand the extent of forgetting during continual learning. As a…

Machine Learning · Computer Science 2025-02-18 Daniel Goldfarb , Paul Hand

Provable Benefits of Overparameterization in Model Compression: From Double Descent to Pruning Neural Networks

Deep networks are typically trained with many more parameters than the size of the training dataset. Recent empirical evidence indicates that the practice of overparameterization not only benefits training large models, but also assists -…

Machine Learning · Computer Science 2020-12-17 Xiangyu Chang , Yingcong Li , Samet Oymak , Christos Thrampoulidis

A U-turn on Double Descent: Rethinking Parameter Counting in Statistical Learning

Conventional statistical wisdom established a well-understood relationship between model complexity and prediction error, typically presented as a U-shaped curve reflecting a transition between under- and overfitting regimes. However,…

Machine Learning · Statistics 2023-10-31 Alicia Curth , Alan Jeffares , Mihaela van der Schaar