Related papers: Estimating Subagging by cross-validation

Concentration inequalities of the cross-validation estimator for Empirical Risk Minimiser

In this article, we derive concentration inequalities for the cross-validation estimate of the generalization error for empirical risk minimizers. In the general setting, we prove sanity-check bounds in the spirit of \cite{KR99}…

Machine Learning · Statistics 2010-11-02 Matthieu Cornec

Concentration inequalities of the cross-validation estimate for stable predictors

In this article, we derive concentration inequalities for the cross-validation estimate of the generalization error for stable predictors in the context of risk assessment. The notion of stability has been first introduced by \cite{DEWA79}…

Machine Learning · Statistics 2010-11-24 Matthieu Cornec

Cross-validation: what does it estimate and how well does it do it?

Cross-validation is a widely-used technique to estimate prediction error, but its behavior is complex and not fully understood. Ideally, one would like to think that cross-validation estimates the prediction error for the model at hand, fit…

Methodology · Statistics 2024-03-12 Stephen Bates , Trevor Hastie , Robert Tibshirani

A bias correction for the minimum error rate in cross-validation

Tuning parameters in supervised learning problems are often estimated by cross-validation. The minimum value of the cross-validation error can be biased downward as an estimate of the test error at that same value of the tuning parameter.…

Applications · Statistics 2009-08-21 Ryan J. Tibshirani , Robert Tibshirani

Distributional bias compromises leave-one-out cross-validation

Cross-validation is a common method for estimating the predictive performance of machine learning models. In a data-scarce regime, where one typically wishes to maximize the number of instances used for training the model, an approach…

Methodology · Statistics 2025-03-25 George I. Austin , Itsik Pe'er , Tal Korem

Bagging cross-validated bandwidth selection in nonparametric regression estimation with applications to large-sized samples

Cross-validation is a well-known and widely used bandwidth selection method in nonparametric regression estimation. However, this technique has two remarkable drawbacks: (i) the large variability of the selected bandwidths, and (ii) the…

Methodology · Statistics 2021-05-11 D. Barreiro-Ures , R. Cao , M. Francisco-Fernández

A Honest Cross-Validation Estimator for Prediction Performance

Cross-validation is a standard tool for obtaining a honest assessment of the performance of a prediction model. The commonly used version repeatedly splits data, trains the prediction model on the training set, evaluates the model…

Machine Learning · Statistics 2025-10-10 Tianyu Pan , Vincent Z. Yu , Viswanath Devanarayan , Lu Tian

Cross-validation

This text is a survey on cross-validation. We define all classical cross-validation procedures, and we study their properties for two different goals: estimating the risk of a given estimator, and selecting the best estimator among a given…

Statistics Theory · Mathematics 2017-03-10 Sylvain Arlot

Cross-validation Confidence Intervals for Test Error

This work develops central limit theorems for cross-validation and consistent estimators of its asymptotic variance under weak stability conditions on the learning algorithm. Together, these results provide practical, asymptotically-exact…

Machine Learning · Statistics 2020-11-03 Pierre Bayle , Alexandre Bayle , Lucas Janson , Lester Mackey

Concentration inequalities for leave-one-out cross validation

In this article we prove that estimator stability is enough to show that leave-one-out cross validation is a sound procedure, by providing concentration bounds in a general framework. In particular, we provide concentration bounds beyond…

Statistics Theory · Mathematics 2023-10-17 Benny Avelin , Lauri Viitasaari

A universal approximate cross-validation criterion and its asymptotic distribution

A general framework is that the estimators of a distribution are obtained by minimizing a function (the estimating function) and they are assessed through another function (the assessment function). The estimating and assessment functions…

Statistics Theory · Mathematics 2022-01-14 Daniel Commenges , Cécile Proust-Lima , Cécilia Samieri , Benoit Liquet

The Structure of Cross-Validation Error: Stability, Covariance, and Minimax Limits

Despite ongoing theoretical research on cross-validation (CV), many theoretical questions remain widely open. This motivates our investigation into how properties of algorithm-distribution pairs can affect the choice for the number of folds…

Statistics Theory · Mathematics 2026-01-09 Ido Nachum , Rüdiger Urbanke , Thomas Weinberger

The use of cross validation in the analysis of designed experiments

Cross-validation (CV) is a common method to tune machine learning methods and can be used for model selection in regression as well. Because of the structured nature of small, traditional experimental designs, the literature has warned…

Applications · Statistics 2025-06-18 Maria L. Weese , Byran J. Smucker , David J. Edwards

Cross-Validation with Confidence

Cross-validation is one of the most popular model selection methods in statistics and machine learning. Despite its wide applicability, traditional cross validation methods tend to select overfitting models, due to the ignorance of the…

Methodology · Statistics 2017-12-25 Jing Lei

Subbagging Variable Selection for Big Data

This article introduces a subbagging (subsample aggregating) approach for variable selection in regression within the context of big data. The proposed subbagging approach not only ensures that variable selection is scalable given the…

Methodology · Statistics 2025-03-10 Xian Li , Xuan Liang , Tao Zou

Comparing Cluster-Based Cross-Validation Strategies for Machine Learning Model Evaluation

Cross-validation plays a fundamental role in Machine Learning, enabling robust evaluation of model performance and preventing overestimation on training and validation data. However, one of its drawbacks is the potential to create data…

Machine Learning · Computer Science 2025-08-28 Afonso Martini Spezia , Thomas Fontanari , Mariana Recamonde-Mendoza

Cross Validation for Correlated Data in Regression and Classification Models, with Applications to Deep Learning

We present a methodology for model evaluation and selection where the sampling mechanism violates the i.i.d. assumption. Our methodology involves a formulation of the bias between the standard Cross-Validation (CV) estimator and the mean…

Methodology · Statistics 2025-03-14 Oren Yuval , Saharon Rosset

Robust importance-weighted cross-validation under sample selection bias

Cross-validation under sample selection bias can, in principle, be done by importance-weighting the empirical risk. However, the importance-weighted risk estimator produces sub-optimal hyperparameter estimates in problem settings where…

Machine Learning · Computer Science 2019-08-28 Wouter M. Kouw , Jesse H. Krijthe , Marco Loog

Cross-validation failure: small sample sizes lead to large error bars

Predictive models ground many state-of-the-art developments in statistical brain image analysis: decoding, MVPA, searchlight, or extraction of biomarkers. The principled approach to establish their validity and usefulness is…

Quantitative Methods · Quantitative Biology 2017-06-26 Gaël Varoquaux

Confidence intervals for the Cox model test error from cross-validation

Cross-validation (CV) is one of the most widely used techniques in statistical learning for estimating the test error of a model, but its behavior is not yet fully understood. It has been shown that standard confidence intervals for test…

Methodology · Statistics 2023-10-10 Min Woo Sun , Robert Tibshirani