Related papers: Cross-validation

A survey of cross-validation procedures for model selection

Used to estimate the risk of an estimator or to perform model selection, cross-validation is a widespread strategy because of its simplicity and its apparent universality. Many results exist on the model selection performances of…

Statistics Theory · Mathematics 2011-02-01 Sylvain Arlot , Alain Celisse

Cross-Validation with Confidence

Cross-validation is one of the most popular model selection methods in statistics and machine learning. Despite its wide applicability, traditional cross validation methods tend to select overfitting models, due to the ignorance of the…

Methodology · Statistics 2017-12-25 Jing Lei

Cross-Validation, Risk Estimation, and Model Selection

Cross-validation is a popular non-parametric method for evaluating the accuracy of a predictive rule. The usefulness of cross-validation depends on the task we want to employ it for. In this note, I discuss a simple non-parametric setting,…

Methodology · Statistics 2019-09-27 Stefan Wager

Cross-validation: what does it estimate and how well does it do it?

Cross-validation is a widely-used technique to estimate prediction error, but its behavior is complex and not fully understood. Ideally, one would like to think that cross-validation estimates the prediction error for the model at hand, fit…

Methodology · Statistics 2024-03-12 Stephen Bates , Trevor Hastie , Robert Tibshirani

A Honest Cross-Validation Estimator for Prediction Performance

Cross-validation is a standard tool for obtaining a honest assessment of the performance of a prediction model. The commonly used version repeatedly splits data, trains the prediction model on the training set, evaluates the model…

Machine Learning · Statistics 2025-10-10 Tianyu Pan , Vincent Z. Yu , Viswanath Devanarayan , Lu Tian

A bias correction for the minimum error rate in cross-validation

Tuning parameters in supervised learning problems are often estimated by cross-validation. The minimum value of the cross-validation error can be biased downward as an estimate of the test error at that same value of the tuning parameter.…

Applications · Statistics 2009-08-21 Ryan J. Tibshirani , Robert Tibshirani

Model selection for estimation of causal parameters

A popular technique for selecting and tuning machine learning estimators is cross-validation. Cross-validation evaluates overall model fit, usually in terms of predictive accuracy. In causal inference, the optimal choice of estimator…

Methodology · Statistics 2021-07-07 Dominik Rothenhäusler

Estimating the Maximum Expected Value: An Analysis of (Nested) Cross Validation and the Maximum Sample Average

We investigate the accuracy of the two most common estimators for the maximum expected value of a general set of random variables: a generalization of the maximum sample average, and cross validation. No unbiased estimator exists and we…

Machine Learning · Statistics 2013-03-04 Hado van Hasselt

Robust importance-weighted cross-validation under sample selection bias

Cross-validation under sample selection bias can, in principle, be done by importance-weighting the empirical risk. However, the importance-weighted risk estimator produces sub-optimal hyperparameter estimates in problem settings where…

Machine Learning · Computer Science 2019-08-28 Wouter M. Kouw , Jesse H. Krijthe , Marco Loog

Cross Validation for Correlated Data in Regression and Classification Models, with Applications to Deep Learning

We present a methodology for model evaluation and selection where the sampling mechanism violates the i.i.d. assumption. Our methodology involves a formulation of the bias between the standard Cross-Validation (CV) estimator and the mean…

Methodology · Statistics 2025-03-14 Oren Yuval , Saharon Rosset

Towards new cross-validation-based estimators for Gaussian process regression: efficient adjoint computation of gradients

We consider the problem of estimating the parameters of the covariance function of a Gaussian process by cross-validation. We suggest using new cross-validation criteria derived from the literature of scoring rules. We also provide an…

Computation · Statistics 2020-08-07 Sébastien Petit , Julien Bect , Sébastien da Veiga , Paul Feliot , Emmanuel Vazquez

Bootstrapping the Cross-Validation Estimate

Cross-validation is a widely used technique for evaluating the performance of prediction models, ranging from simple binary classification to complex precision medicine strategies. It helps correct for optimism bias in error estimates,…

Methodology · Statistics 2025-09-05 Bryan Cai , Yuanhui Luo , Xinzhou Guo , Fabio Pellegrini , Menglan Pang , Carl de Moor , Changyu Shen , Vivek Charu , Lu Tian

Cross-Validated Off-Policy Evaluation

We study estimator selection and hyper-parameter tuning in off-policy evaluation. Although cross-validation is the most popular method for model selection in supervised learning, off-policy evaluation relies mostly on theory, which provides…

Machine Learning · Computer Science 2024-12-23 Matej Cief , Branislav Kveton , Michal Kompan

Correcting for selection bias via cross-validation in the classification of microarray data

There is increasing interest in the use of diagnostic rules based on microarray data. These rules are formed by considering the expression levels of thousands of genes in tissue samples taken on patients of known classification with respect…

Statistics Theory · Mathematics 2008-12-18 G. J. McLachlan , J. Chevelu , J. Zhu

On the cross-validation bias due to unsupervised pre-processing

Cross-validation is the de facto standard for predictive model evaluation and selection. In proper use, it provides an unbiased estimate of a model's predictive performance. However, data sets often undergo various forms of data-dependent…

Methodology · Statistics 2023-01-18 Amit Moscovich , Saharon Rosset

Nested cross-validation when selecting classifiers is overzealous for most practical applications

When selecting a classification algorithm to be applied to a particular problem, one has to simultaneously select the best algorithm for that dataset \emph{and} the best set of hyperparameters for the chosen model. The usual approach is to…

Machine Learning · Computer Science 2018-09-26 Jacques Wainer , Gavin Cawley

Pre-validation Revisited

Pre-validation is a way to build prediction model with two datasets of significantly different feature dimensions. Previous work showed that the asymptotic distribution of the resulting test statistic for the pre-validated predictor…

Methodology · Statistics 2025-05-23 Jing Shang , Sourav Chatterjee , Trevor Hastie , Robert Tibshirani

A Theory of Cross-Validation Error

This paper presents a theory of error in cross-validation testing of algorithms for predicting real-valued attributes. The theory justifies the claim that predicting real-valued attributes requires balancing the conflicting demands of…

Machine Learning · Computer Science 2007-05-23 Peter D. Turney

Cross-validation Approaches for Multi-study Predictions

We consider prediction in multiple studies with potential differences in the relationships between predictors and outcomes. Our objective is to integrate data from multiple studies to develop prediction models for unseen studies. We propose…

Methodology · Statistics 2024-07-23 Boyu Ren , Prasad Patil , Francesca Dominici , Giovanni Parmigiani , Lorenzo Trippa

Bootstrap Bias Corrected Cross Validation applied to Super Learning

Super learner algorithm can be applied to combine results of multiple base learners to improve quality of predictions. The default method for verification of super learner results is by nested cross validation. It has been proposed by…

Machine Learning · Computer Science 2020-03-19 Krzysztof Mnich , Agnieszka Kitlas Golińska , Aneta Polewko-Klim , Witold R. Rudnicki