Optimal cross-validation in density estimation with the $L^2$-loss

Alain Celisse

doi:10.1214/14-AOS1240

Optimal cross-validation in density estimation with the $L^2$-loss

Statistics Theory 2014-10-02 v4 Statistics Theory

Authors: Alain Celisse

View on arXiv ↗ PDF ↗ DOI ↗

Abstract

We analyze the performance of cross-validation (CV) in the density estimation framework with two purposes: (i) risk estimation and (ii) model selection. The main focus is given to the so-called leave- $p$ -out CV procedure (Lpo), where $p$ denotes the cardinality of the test set. Closed-form expressions are settled for the Lpo estimator of the risk of projection estimators. These expressions provide a great improvement upon $V$ -fold cross-validation in terms of variability and computational complexity. From a theoretical point of view, closed-form expressions also enable to study the Lpo performance in terms of risk estimation. The optimality of leave-one-out (Loo), that is Lpo with $p=1$ , is proved among CV procedures used for risk estimation. Two model selection frameworks are also considered: estimation, as opposed to identification. For estimation with finite sample size $n$ , optimality is achieved for $p$ large enough [with $p/n=o(1)$ ] to balance the overfitting resulting from the structure of the model collection. For identification, model selection consistency is settled for Lpo as long as $p/n$ is conveniently related to the rate of convergence of the best estimator in the collection: (i) $p/n\to1$ as $n\to+\infty$ with a parametric rate, and (ii) $p/n=o(1)$ with some nonparametric estimators. These theoretical results are validated by simulation experiments.

Cite

@article{arxiv.0811.0802,
  title  = {Optimal cross-validation in density estimation with the $L^2$-loss},
  author = {Alain Celisse},
  journal= {arXiv preprint arXiv:0811.0802},
  year   = {2014}
}

Comments

Published in at http://dx.doi.org/10.1214/14-AOS1240 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Optimal cross-validation in density estimation with the $L^2$-loss

Abstract

Cite

Comments

Related papers