English

Optimal Cross-Validation for Sparse Linear Regression

Optimization and Control 2026-02-13 v4 Machine Learning Methodology

Abstract

Given a high-dimensional covariate matrix and a response vector, ridge-regularized sparse linear regression selects a subset of features that explains the relationship between covariates and the response in an interpretable manner. To choose hyperparameters that control the sparsity level and amount of regularization, practitioners commonly use k-fold cross-validation. However, cross-validation substantially increases the computational cost of sparse regression as it requires solving many mixed-integer optimization problems (MIOs) for each hyperparameter combination. To address this computational burden, we derive computationally tractable relaxations of the k-fold cross-validation loss, facilitating hyperparameter selection while solving 5050--80%80\% fewer MIOs in practice. Our computational results demonstrate, across eleven real-world UCI datasets, that exact MIO-based cross-validation can be competitive with mature software packages such as glmnet and L0Learn -particularly when the sample-to-feature ratio is small.

Keywords

Cite

@article{arxiv.2306.14851,
  title  = {Optimal Cross-Validation for Sparse Linear Regression},
  author = {Ryan Cory-Wright and Andrés Gómez},
  journal= {arXiv preprint arXiv:2306.14851},
  year   = {2026}
}

Comments

Updated manuscript for revision