English

Stability Regularized Cross-Validation

Optimization and Control 2026-02-04 v2 Machine Learning Machine Learning

Abstract

We revisit the problem of ensuring strong test set performance via cross-validation, and propose a nested k-fold cross-validation scheme that selects hyperparameters by minimizing a weighted sum of the usual cross-validation metric and an empirical model-stability measure. The weight on the stability term is itself chosen via a nested cross-validation procedure. This reduces the risk of strong validation set performance and poor test set performance due to instability. We benchmark our procedure on a suite of 1313 real-world datasets, and find that, compared to kk-fold cross-validation over the same hyperparameters, it improves the out-of-sample MSE for sparse ridge regression and CART by 4%4\% and 2%2\% respectively on average, but has no impact on XGBoost. It also reduces the user's out-of-sample disappointment, sometimes significantly. For instance, for sparse ridge regression, the nested k-fold cross-validation error is on average 0.9%0.9\% lower than the test set error, while the kk-fold cross-validation error is 21.8%21.8\% lower than the test error. Thus, for unstable models such as sparse regression and CART, our approach improves test set performance and reduces out-of-sample disappointment.

Keywords

Cite

@article{arxiv.2505.06927,
  title  = {Stability Regularized Cross-Validation},
  author = {Ryan Cory-Wright and Andrés Gómez},
  journal= {arXiv preprint arXiv:2505.06927},
  year   = {2026}
}

Comments

Some of this material previously appeared in 2306.14851v2, which we have split into two papers (this one and 2306.14851v3), because it contained two ideas that need separate papers

R2 v1 2026-06-28T23:28:34.289Z