Cross-Validation, Risk Estimation, and Model Selection
Methodology
2019-09-27 v1
Abstract
Cross-validation is a popular non-parametric method for evaluating the accuracy of a predictive rule. The usefulness of cross-validation depends on the task we want to employ it for. In this note, I discuss a simple non-parametric setting, and find that cross-validation is asymptotically uninformative about the expected test error of any given predictive rule, but allows for asymptotically consistent model selection. The reason for this phenomenon is that the leading-order error term of cross-validation doesn't depend on the model being evaluated, and so cancels out when we compare two models.
Cite
@article{arxiv.1909.11696,
title = {Cross-Validation, Risk Estimation, and Model Selection},
author = {Stefan Wager},
journal= {arXiv preprint arXiv:1909.11696},
year = {2019}
}
Comments
This note was prepared as a comment on a paper by Rosset and Tibshirani, forthcoming in the Journal of the American Statistical Association