English

Cross-Validation, Risk Estimation, and Model Selection

Methodology 2019-09-27 v1

Abstract

Cross-validation is a popular non-parametric method for evaluating the accuracy of a predictive rule. The usefulness of cross-validation depends on the task we want to employ it for. In this note, I discuss a simple non-parametric setting, and find that cross-validation is asymptotically uninformative about the expected test error of any given predictive rule, but allows for asymptotically consistent model selection. The reason for this phenomenon is that the leading-order error term of cross-validation doesn't depend on the model being evaluated, and so cancels out when we compare two models.

Keywords

Cite

@article{arxiv.1909.11696,
  title  = {Cross-Validation, Risk Estimation, and Model Selection},
  author = {Stefan Wager},
  journal= {arXiv preprint arXiv:1909.11696},
  year   = {2019}
}

Comments

This note was prepared as a comment on a paper by Rosset and Tibshirani, forthcoming in the Journal of the American Statistical Association