Related papers: Cross-Validation for Correlated Data

Cross Validation for Correlated Data in Regression and Classification Models, with Applications to Deep Learning

We present a methodology for model evaluation and selection where the sampling mechanism violates the i.i.d. assumption. Our methodology involves a formulation of the bias between the standard Cross-Validation (CV) estimator and the mean…

Methodology · Statistics 2025-03-14 Oren Yuval , Saharon Rosset

Confidence intervals for the Cox model test error from cross-validation

Cross-validation (CV) is one of the most widely used techniques in statistical learning for estimating the test error of a model, but its behavior is not yet fully understood. It has been shown that standard confidence intervals for test…

Methodology · Statistics 2023-10-10 Min Woo Sun , Robert Tibshirani

Cross-validation: what does it estimate and how well does it do it?

Cross-validation is a widely-used technique to estimate prediction error, but its behavior is complex and not fully understood. Ideally, one would like to think that cross-validation estimates the prediction error for the model at hand, fit…

Methodology · Statistics 2024-03-12 Stephen Bates , Trevor Hastie , Robert Tibshirani

Is K-fold cross validation the best model selection method for Machine Learning?

As a technique that can compactly represent complex patterns, machine learning has significant potential for predictive inference. K-fold cross-validation (CV) is the most common approach to ascertaining the likelihood that a machine…

Machine Learning · Statistics 2026-04-24 Juan M Gorriz , R. Martin Clemente , F Segovia , J Ramirez , A Ortiz , J. Suckling

Is Cross-Validation the Gold Standard to Evaluate Model Performance?

Cross-Validation (CV) is the default choice for evaluating the performance of machine learning models. Despite its wide usage, their statistical benefits have remained half-understood, especially in challenging nonparametric regimes. In…

Statistics Theory · Mathematics 2024-08-22 Garud Iyengar , Henry Lam , Tianyu Wang

Estimating the Prediction Performance of Spatial Models via Spatial k-Fold Cross Validation

In machine learning one often assumes the data are independent when evaluating model performance. However, this rarely holds in practise. Geographic information data sets are an example where the data points have stronger dependencies among…

Applications · Statistics 2020-06-01 Jonne Pohjankukka , Tapio Pahikkala , Paavo Nevalainen , Jukka Heikkonen

Approximate Cross-Validation for Structured Models

Many modern data analyses benefit from explicitly modeling dependence structure in data -- such as measurements across time or space, ordered words in a sentence, or genes in a genome. A gold standard evaluation technique is structured…

Machine Learning · Statistics 2020-12-02 Soumya Ghosh , William T. Stephenson , Tin D. Nguyen , Sameer K. Deshpande , Tamara Broderick

Cross-Validation, Risk Estimation, and Model Selection

Cross-validation is a popular non-parametric method for evaluating the accuracy of a predictive rule. The usefulness of cross-validation depends on the task we want to employ it for. In this note, I discuss a simple non-parametric setting,…

Methodology · Statistics 2019-09-27 Stefan Wager

Cross-Validation for Unsupervised Learning

Cross-validation (CV) is a popular method for model-selection. Unfortunately, it is not immediately obvious how to apply CV to unsupervised or exploratory contexts. This thesis discusses some extensions of cross-validation to unsupervised…

Methodology · Statistics 2009-09-17 Patrick O. Perry

The use of cross validation in the analysis of designed experiments

Cross-validation (CV) is a common method to tune machine learning methods and can be used for model selection in regression as well. Because of the structured nature of small, traditional experimental designs, the literature has warned…

Applications · Statistics 2025-06-18 Maria L. Weese , Byran J. Smucker , David J. Edwards

Using J-K fold Cross Validation to Reduce Variance When Tuning NLP Models

K-fold cross validation (CV) is a popular method for estimating the true performance of machine learning models, allowing model selection and parameter tuning. However, the very process of CV requires random partitioning of the data and so…

Computation and Language · Computer Science 2018-06-20 Henry B. Moss , David S. Leslie , Paul Rayson

Approximate cross-validation formula for Bayesian linear regression

Cross-validation (CV) is a technique for evaluating the ability of statistical models/learning systems based on a given data set. Despite its wide applicability, the rather heavy computational cost can prevent its use as the system size…

Machine Learning · Statistics 2016-10-26 Yoshiyuki Kabashima , Tomoyuki Obuchi , Makoto Uemura

Cross-validation for change-point regression: pitfalls and solutions

Cross-validation is the standard approach for tuning parameter selection in many non-parametric regression problems. However its use is less common in change-point regression, perhaps as its prediction error-based criterion may appear to…

Methodology · Statistics 2024-02-13 Florian Pein , Rajen D. Shah

On the bias of K-fold cross validation with stable learners

This paper investigates the efficiency of the K-fold cross-validation (CV) procedure and a debiased version thereof as a means of estimating the generalization risk of a learning algorithm. We work under the general assumption of uniform…

Statistics Theory · Mathematics 2023-06-13 Anass Aghbalou , François Portier , Anne Sabourin

The Structure of Cross-Validation Error: Stability, Covariance, and Minimax Limits

Despite ongoing theoretical research on cross-validation (CV), many theoretical questions remain widely open. This motivates our investigation into how properties of algorithm-distribution pairs can affect the choice for the number of folds…

Statistics Theory · Mathematics 2026-01-09 Ido Nachum , Rüdiger Urbanke , Thomas Weinberger

Cross-Validation with Confidence

Cross-validation is one of the most popular model selection methods in statistics and machine learning. Despite its wide applicability, traditional cross validation methods tend to select overfitting models, due to the ignorance of the…

Methodology · Statistics 2017-12-25 Jing Lei

Consistency of cross validation for comparing regression procedures

Theoretical developments on cross validation (CV) have mainly focused on selecting one among a list of finite-dimensional models (e.g., subset or order selection in linear regression) or selecting a smoothing parameter (e.g., bandwidth for…

Statistics Theory · Mathematics 2008-12-18 Yuhong Yang

Predictive Performance Test based on the Exhaustive Nested Cross-Validation for High-dimensional data

It is crucial to assess the predictive performance of a model to establish its practicality and relevance in real-world scenarios, particularly for high-dimensional data analysis. Among data splitting or resampling methods, cross-validation…

Methodology · Statistics 2025-11-26 Iris Ivy Gauran , Hernando Ombao , Zhaoxia Yu

Approximate Cross-validation: Guarantees for Model Assessment and Selection

Cross-validation (CV) is a popular approach for assessing and selecting predictive models. However, when the number of folds is large, CV suffers from a need to repeatedly refit a learning procedure on a large number of training datasets.…

Machine Learning · Statistics 2020-06-12 Ashia Wilson , Maximilian Kasy , Lester Mackey

Leave Zero Out: Towards a No-Cross-Validation Approach for Model Selection

As the main workhorse for model selection, Cross Validation (CV) has achieved an empirical success due to its simplicity and intuitiveness. However, despite its ubiquitous role, CV often falls into the following notorious dilemmas. On the…

Machine Learning · Computer Science 2020-12-29 Weikai Li , Chuanxing Geng , Songcan Chen