Related papers: Efficient algorithms for decision tree cross-valid…

Cross-Validation with Confidence

Cross-validation is one of the most popular model selection methods in statistics and machine learning. Despite its wide applicability, traditional cross validation methods tend to select overfitting models, due to the ignorance of the…

Methodology · Statistics 2017-12-25 Jing Lei

Fast Cross-Validation for Incremental Learning

Cross-validation (CV) is one of the main tools for performance estimation and parameter tuning in machine learning. The general recipe for computing CV estimate is to run a learning algorithm separately for each CV fold, a computationally…

Machine Learning · Statistics 2015-07-02 Pooria Joulani , András György , Csaba Szepesvári

Don't Waste Your Time: Early Stopping Cross-Validation

State-of-the-art automated machine learning systems for tabular data often employ cross-validation; ensuring that measured performances generalize to unseen data, or that subsequent ensembling does not overfit. However, using k-fold…

Machine Learning · Computer Science 2024-08-05 Edward Bergman , Lennart Purucker , Frank Hutter

A survey of cross-validation procedures for model selection

Used to estimate the risk of an estimator or to perform model selection, cross-validation is a widespread strategy because of its simplicity and its apparent universality. Many results exist on the model selection performances of…

Statistics Theory · Mathematics 2011-02-01 Sylvain Arlot , Alain Celisse

Comparing Cluster-Based Cross-Validation Strategies for Machine Learning Model Evaluation

Cross-validation plays a fundamental role in Machine Learning, enabling robust evaluation of model performance and preventing overestimation on training and validation data. However, one of its drawbacks is the potential to create data…

Machine Learning · Computer Science 2025-08-28 Afonso Martini Spezia , Thomas Fontanari , Mariana Recamonde-Mendoza

The use of cross validation in the analysis of designed experiments

Cross-validation (CV) is a common method to tune machine learning methods and can be used for model selection in regression as well. Because of the structured nature of small, traditional experimental designs, the literature has warned…

Applications · Statistics 2025-06-18 Maria L. Weese , Byran J. Smucker , David J. Edwards

From Theory to Practice: Implementing and Evaluating e-Fold Cross-Validation

This paper introduces e-fold cross-validation, an energy-efficient alternative to k-fold cross-validation. It dynamically adjusts the number of folds based on a stopping criterion. The criterion checks after each fold whether the standard…

Machine Learning · Computer Science 2024-10-29 Christopher Mahlich , Tobias Vente , Joeran Beel

Cross-Validation in Penalized Linear Mixed Models: Addressing Common Implementation Pitfalls

In this paper, we develop an implementation of cross-validation for penalized linear mixed models. While these models have been proposed for correlated high-dimensional data, the current literature implicitly assumes that tuning parameter…

Methodology · Statistics 2025-03-19 Tabitha K. Peter , Patrick J. Breheny

Cross-Validation, Risk Estimation, and Model Selection

Cross-validation is a popular non-parametric method for evaluating the accuracy of a predictive rule. The usefulness of cross-validation depends on the task we want to employ it for. In this note, I discuss a simple non-parametric setting,…

Methodology · Statistics 2019-09-27 Stefan Wager

Nested cross-validation when selecting classifiers is overzealous for most practical applications

When selecting a classification algorithm to be applied to a particular problem, one has to simultaneously select the best algorithm for that dataset \emph{and} the best set of hyperparameters for the chosen model. The usual approach is to…

Machine Learning · Computer Science 2018-09-26 Jacques Wainer , Gavin Cawley

Reinforced Decision Trees

In order to speed-up classification models when facing a large number of categories, one usual approach consists in organizing the categories in a particular structure, this structure being then used as a way to speed-up the prediction…

Machine Learning · Computer Science 2015-11-26 Aurélia Léon , Ludovic Denoyer

Cross-validation: what does it estimate and how well does it do it?

Cross-validation is a widely-used technique to estimate prediction error, but its behavior is complex and not fully understood. Ideally, one would like to think that cross-validation estimates the prediction error for the model at hand, fit…

Methodology · Statistics 2024-03-12 Stephen Bates , Trevor Hastie , Robert Tibshirani

Network cross-validation by edge sampling

While many statistical models and methods are now available for network analysis, resampling network data remains a challenging problem. Cross-validation is a useful general tool for model selection and parameter tuning, but is not directly…

Methodology · Statistics 2020-05-04 Tianxi Li , Elizaveta Levina , Ji Zhu

Network Cross-Validation and Model Selection via Subsampling

Complex and larger networks are becoming increasingly prevalent in scientific applications in various domains. Although a number of models and methods exist for such networks, cross-validation on networks remains challenging due to the…

Methodology · Statistics 2026-03-12 Sayan Chakrabarty , Srijan Sengupta , Yuguo Chen

Cross-validation

This text is a survey on cross-validation. We define all classical cross-validation procedures, and we study their properties for two different goals: estimating the risk of a given estimator, and selecting the best estimator among a given…

Statistics Theory · Mathematics 2017-03-10 Sylvain Arlot

A Theory of Cross-Validation Error

This paper presents a theory of error in cross-validation testing of algorithms for predicting real-valued attributes. The theory justifies the claim that predicting real-valued attributes requires balancing the conflicting demands of…

Machine Learning · Computer Science 2007-05-23 Peter D. Turney

Cross-Validation and Uncertainty Determination for Randomized Neural Networks with Applications to Mobile Sensors

Randomized artificial neural networks such as extreme learning machines provide an attractive and efficient method for supervised learning under limited computing ressources and green machine learning. This especially applies when equipping…

Machine Learning · Statistics 2022-01-02 Ansgar Steland , Bart E. Pieters

Cross validation approaches for penalized Cox regression

Cross validation is commonly used for selecting tuning parameters in penalized regression, but its use in penalized Cox regression models has received relatively little attention in the literature. Due to its partial likelihood…

Methodology · Statistics 2026-05-13 Biyue Dai , Patrick Breheny

Fast and Informative Model Selection using Learning Curve Cross-Validation

Common cross-validation (CV) methods like k-fold cross-validation or Monte-Carlo cross-validation estimate the predictive performance of a learner by repeatedly training it on a large portion of the given data and testing on the remaining…

Machine Learning · Computer Science 2021-11-30 Felix Mohr , Jan N. van Rijn

Cross-Validation for Unsupervised Learning

Cross-validation (CV) is a popular method for model-selection. Unfortunately, it is not immediately obvious how to apply CV to unsupervised or exploratory contexts. This thesis discusses some extensions of cross-validation to unsupervised…

Methodology · Statistics 2009-09-17 Patrick O. Perry