Related papers: A Regularization-Sharpness Tradeoff for Linear Int…

Regularizing Extrapolation in Causal Inference

Many common estimators in machine learning and causal inference are linear smoothers, where the prediction is a weighted average of the training outcomes. Some estimators, such as ordinary least squares and kernel ridge regression, allow…

Machine Learning · Computer Science 2026-04-02 David Arbour , Harsh Parikh , Bijan Niknam , Elizabeth Stuart , Kara Rudolph , Avi Feller

Implicit Regularization Leads to Benign Overfitting for Sparse Linear Regression

In deep learning, often the training process finds an interpolator (a solution with 0 training loss), but the test loss is still low. This phenomenon, known as benign overfitting, is a major mystery that received a lot of recent attention.…

Machine Learning · Computer Science 2023-05-29 Mo Zhou , Rong Ge

Overparameterization and generalization error: weighted trigonometric interpolation

Motivated by surprisingly good generalization properties of learned deep neural networks in overparameterized scenarios and by the related double descent phenomenon, this paper analyzes the relation between smoothness and low generalization…

Machine Learning · Computer Science 2021-10-29 Yuege Xie , Hung-Hsu Chou , Holger Rauhut , Rachel Ward

To Each Optimizer a Norm, To Each Norm its Generalization

We study the implicit regularization of optimization methods for linear models interpolating the training data in the under-parametrized and over-parametrized regimes. Since it is difficult to determine whether an optimizer converges to…

Machine Learning · Computer Science 2022-07-12 Sharan Vaswani , Reza Babanezhad , Jose Gallego-Posada , Aaron Mishkin , Simon Lacoste-Julien , Nicolas Le Roux

Memorize to Generalize: on the Necessity of Interpolation in High Dimensional Linear Regression

We examine the necessity of interpolation in overparameterized models, that is, when achieving optimal predictive risk in machine learning problems requires (nearly) interpolating the training data. In particular, we consider simple…

Machine Learning · Statistics 2022-06-17 Chen Cheng , John Duchi , Rohith Kuditipudi

A Universal Trade-off Between the Model Size, Test Loss, and Training Loss of Linear Predictors

In this work we establish an algorithm and distribution independent non-asymptotic trade-off between the model size, excess test loss, and training loss of linear predictors. Specifically, we show that models that perform well on the test…

Machine Learning · Statistics 2023-04-20 Nikhil Ghosh , Mikhail Belkin

Characterizing the SLOPE Trade-off: A Variational Perspective and the Donoho-Tanner Limit

Sorted l1 regularization has been incorporated into many methods for solving high-dimensional statistical estimation problems, including the SLOPE estimator in linear regression. In this paper, we study how this relatively new…

Statistics Theory · Mathematics 2022-06-07 Zhiqi Bu , Jason Klusowski , Cynthia Rush , Weijie J. Su

High-Dimensional Linear Regression via Implicit Regularization

Many statistical estimators for high-dimensional linear regression are M-estimators, formed through minimizing a data-dependent square loss function plus a regularizer. This work considers a new class of estimators implicitly defined…

Statistics Theory · Mathematics 2022-02-15 Peng Zhao , Yun Yang , Qiao-Chu He

Memorizing without overfitting: Bias, variance, and interpolation in over-parameterized models

The bias-variance trade-off is a central concept in supervised learning. In classical statistics, increasing the complexity of a model (e.g., number of parameters) reduces bias but also increases variance. Until recently, it was commonly…

Machine Learning · Statistics 2022-03-25 Jason W. Rocks , Pankaj Mehta

Penalising the biases in norm regularisation enforces sparsity

Controlling the parameters' norm often yields good generalisation when training neural networks. Beyond simple intuitions, the relation between regularising parameters' norm and obtained estimators remains theoretically misunderstood. For…

Machine Learning · Statistics 2025-04-09 Etienne Boursier , Nicolas Flammarion

Linear regression with overparameterized linear neural networks: Tight upper and lower bounds for implicit $\ell^1$-regularization

Modern machine learning models are often trained in a setting where the number of parameters exceeds the number of training samples. To understand the implicit bias of gradient descent in such overparameterized models, prior work has…

Machine Learning · Statistics 2025-10-29 Hannes Matt , Dominik Stöger

Sparse Regression with Multi-type Regularized Feature Modeling

Within the statistical and machine learning literature, regularization techniques are often used to construct sparse (predictive) models. Most regularization strategies only work for data where all predictors are treated identically, such…

Computation · Statistics 2020-12-16 Sander Devriendt , Katrien Antonio , Tom Reynkens , Roel Verbelen

Global Minimizers of $\ell^p$-Regularized Objectives Yield the Sparsest ReLU Neural Networks

Overparameterized neural networks can interpolate a given dataset in many different ways, prompting the fundamental question: which among these solutions should we prefer, and what explicit regularization strategies will provably yield…

Machine Learning · Statistics 2026-01-28 Julia Nakhleh , Robert D. Nowak

Conflicting Biases at the Edge of Stability: Norm versus Sharpness Regularization

A widely believed explanation for the remarkable generalization capacities of overparameterized neural networks is that the optimization algorithms used for training induce an implicit bias towards benign solutions. To grasp this…

Machine Learning · Computer Science 2025-12-19 Maria Matveev , Vit Fojtik , Hung-Hsu Chou , Gitta Kutyniok , Johannes Maly

The distribution of Ridgeless least squares interpolators

The Ridgeless minimum $\ell_2$-norm interpolator in overparametrized linear regression has attracted considerable attention in recent years in both machine learning and statistics communities. While it seems to defy conventional wisdom that…

Statistics Theory · Mathematics 2026-01-21 Qiyang Han , Xiaocong Xu

A study on tuning parameter selection for the high-dimensional lasso

High-dimensional predictive models, those with more measurements than observations, require regularization to be well defined, perform well empirically, and possess theoretical guarantees. The amount of regularization, often determined by…

Methodology · Statistics 2019-07-16 Darren Homrighausen , Daniel J. McDonald

Regularization properties of adversarially-trained linear regression

State-of-the-art machine learning models can be vulnerable to very small input perturbations that are adversarially constructed. Adversarial training is an effective approach to defend against it. Formulated as a min-max problem, it…

Machine Learning · Statistics 2023-10-18 Antônio H. Ribeiro , Dave Zachariah , Francis Bach , Thomas B. Schön

The Interpolating Information Criterion for Overparameterized Models

The problem of model selection is considered for the setting of interpolating estimators, where the number of model parameters exceeds the size of the dataset. Classical information criteria typically consider the large-data limit,…

Machine Learning · Statistics 2026-01-13 Liam Hodgkinson , Chris van der Heide , Robert Salomone , Fred Roosta , Michael W. Mahoney

Understanding Adversarial Robustness: The Trade-off between Minimum and Average Margin

Deep models, while being extremely versatile and accurate, are vulnerable to adversarial attacks: slight perturbations that are imperceptible to humans can completely flip the prediction of deep models. Many attack and defense mechanisms…

Machine Learning · Computer Science 2019-07-30 Kaiwen Wu , Yaoliang Yu

Bias-Variance Tradeoff of Graph Laplacian Regularizer

This paper presents a bias-variance tradeoff of graph Laplacian regularizer, which is widely used in graph signal processing and semi-supervised learning tasks. The scaling law of the optimal regularization parameter is specified in terms…

Machine Learning · Statistics 2017-08-02 Pin-Yu Chen , Sijia Liu