Optimal Cross-Validation for Sparse Linear Regression
Abstract
Given a high-dimensional covariate matrix and a response vector, ridge-regularized sparse linear regression selects a subset of features that explains the relationship between covariates and the response in an interpretable manner. To choose hyperparameters that control the sparsity level and amount of regularization, practitioners commonly use k-fold cross-validation. However, cross-validation substantially increases the computational cost of sparse regression as it requires solving many mixed-integer optimization problems (MIOs) for each hyperparameter combination. To address this computational burden, we derive computationally tractable relaxations of the k-fold cross-validation loss, facilitating hyperparameter selection while solving -- fewer MIOs in practice. Our computational results demonstrate, across eleven real-world UCI datasets, that exact MIO-based cross-validation can be competitive with mature software packages such as glmnet and L0Learn -particularly when the sample-to-feature ratio is small.
Cite
@article{arxiv.2306.14851,
title = {Optimal Cross-Validation for Sparse Linear Regression},
author = {Ryan Cory-Wright and Andrés Gómez},
journal= {arXiv preprint arXiv:2306.14851},
year = {2026}
}
Comments
Updated manuscript for revision