Related papers: Fast Cross-Validation via Sequential Testing

Cross-Validation with Confidence

Cross-validation is one of the most popular model selection methods in statistics and machine learning. Despite its wide applicability, traditional cross validation methods tend to select overfitting models, due to the ignorance of the…

Methodology · Statistics 2017-12-25 Jing Lei

Cross-Validation, Risk Estimation, and Model Selection

Cross-validation is a popular non-parametric method for evaluating the accuracy of a predictive rule. The usefulness of cross-validation depends on the task we want to employ it for. In this note, I discuss a simple non-parametric setting,…

Methodology · Statistics 2019-09-27 Stefan Wager

Consistency of cross validation for comparing regression procedures

Theoretical developments on cross validation (CV) have mainly focused on selecting one among a list of finite-dimensional models (e.g., subset or order selection in linear regression) or selecting a smoothing parameter (e.g., bandwidth for…

Statistics Theory · Mathematics 2008-12-18 Yuhong Yang

Nested cross-validation when selecting classifiers is overzealous for most practical applications

When selecting a classification algorithm to be applied to a particular problem, one has to simultaneously select the best algorithm for that dataset \emph{and} the best set of hyperparameters for the chosen model. The usual approach is to…

Machine Learning · Computer Science 2018-09-26 Jacques Wainer , Gavin Cawley

Bi-cross-validation for factor analysis

Factor analysis is over a century old, but it is still problematic to choose the number of factors for a given data set. The scree test is popular but subjective. The best performing objective methods are recommended on the basis of…

Methodology · Statistics 2015-11-12 A. B. Owen , J. Wang

Cross-validation for change-point regression: pitfalls and solutions

Cross-validation is the standard approach for tuning parameter selection in many non-parametric regression problems. However its use is less common in change-point regression, perhaps as its prediction error-based criterion may appear to…

Methodology · Statistics 2024-02-13 Florian Pein , Rajen D. Shah

A survey of cross-validation procedures for model selection

Used to estimate the risk of an estimator or to perform model selection, cross-validation is a widespread strategy because of its simplicity and its apparent universality. Many results exist on the model selection performances of…

Statistics Theory · Mathematics 2011-02-01 Sylvain Arlot , Alain Celisse

Combined Pruning for Nested Cross-Validation to Accelerate Automated Hyperparameter Optimization for Embedded Feature Selection in High-Dimensional Data with Very Small Sample Sizes

Background: Embedded feature selection in high-dimensional data with very small sample sizes requires optimized hyperparameters for the model building process. For this hyperparameter optimization, nested cross-validation must be applied to…

Machine Learning · Computer Science 2022-09-13 Sigrun May , Sven Hartmann , Frank Klawonn

Sequential Data-Adaptive Bandwidth Selection by Cross-Validation for Nonparametric Prediction

We consider the problem of bandwidth selection by cross-validation from a sequential point of view in a nonparametric regression model. Having in mind that in applications one often aims at estimation, prediction and change detection…

Statistics Theory · Mathematics 2018-03-20 Ansgar Steland

Cross-validation in nonparametric regression with outliers

A popular data-driven method for choosing the bandwidth in standard kernel regression is cross-validation. Even when there are outliers in the data, robust kernel regression can be used to estimate the unknown regression curve [Robust and…

Statistics Theory · Mathematics 2007-06-13 Denis Heng-Yan Leung

Fast Cross-Validation for Incremental Learning

Cross-validation (CV) is one of the main tools for performance estimation and parameter tuning in machine learning. The general recipe for computing CV estimate is to run a learning algorithm separately for each CV fold, a computationally…

Machine Learning · Statistics 2015-07-02 Pooria Joulani , András György , Csaba Szepesvári

Cross-Validation for Unsupervised Learning

Cross-validation (CV) is a popular method for model-selection. Unfortunately, it is not immediately obvious how to apply CV to unsupervised or exploratory contexts. This thesis discusses some extensions of cross-validation to unsupervised…

Methodology · Statistics 2009-09-17 Patrick O. Perry

On the cross-validation bias due to unsupervised pre-processing

Cross-validation is the de facto standard for predictive model evaluation and selection. In proper use, it provides an unbiased estimate of a model's predictive performance. However, data sets often undergo various forms of data-dependent…

Methodology · Statistics 2023-01-18 Amit Moscovich , Saharon Rosset

Network Cross-Validation and Model Selection via Subsampling

Complex and larger networks are becoming increasingly prevalent in scientific applications in various domains. Although a number of models and methods exist for such networks, cross-validation on networks remains challenging due to the…

Methodology · Statistics 2026-03-12 Sayan Chakrabarty , Srijan Sengupta , Yuguo Chen

A Honest Cross-Validation Estimator for Prediction Performance

Cross-validation is a standard tool for obtaining a honest assessment of the performance of a prediction model. The commonly used version repeatedly splits data, trains the prediction model on the training set, evaluates the model…

Machine Learning · Statistics 2025-10-10 Tianyu Pan , Vincent Z. Yu , Viswanath Devanarayan , Lu Tian

Efficient, adaptive cross-validation for tuning and comparing models, with application to drug discovery

Cross-validation (CV) is widely used for tuning a model with respect to user-selected parameters and for selecting a "best" model. For example, the method of $k$-nearest neighbors requires the user to choose $k$, the number of neighbors,…

Applications · Statistics 2012-03-01 Hui Shen , William J. Welch , Jacqueline M. Hughes-Oliver

Parameter Selection Algorithm For Continuous Variables

In this article, we propose a new algorithm for supervised learning methods, by which one can both capture the non-linearity in data and also find the best subset model. To produce an enhanced subset of the original variables, an ideal…

Applications · Statistics 2017-01-23 Peyman Tavallali , Marianne Razavi , Sean Brady

Network cross-validation by edge sampling

While many statistical models and methods are now available for network analysis, resampling network data remains a challenging problem. Cross-validation is a useful general tool for model selection and parameter tuning, but is not directly…

Methodology · Statistics 2020-05-04 Tianxi Li , Elizaveta Levina , Ji Zhu

Cross-validation-based optimal feature selection for linear SVM classification

This paper addresses feature subset selection for Support Vector Machines (SVMs) based on the cross-validation criterion. Unlike statistical criteria such as the Akaike information criterion (AIC) and the Bayesian information criterion…

Optimization and Control · Mathematics 2026-05-11 Masaharu Mori , Shunnosuke Ikeda , Ryuta Tamura , Yuichi Takano , Ryuhei Miyashiro

Comparing Cluster-Based Cross-Validation Strategies for Machine Learning Model Evaluation

Cross-validation plays a fundamental role in Machine Learning, enabling robust evaluation of model performance and preventing overestimation on training and validation data. However, one of its drawbacks is the potential to create data…

Machine Learning · Computer Science 2025-08-28 Afonso Martini Spezia , Thomas Fontanari , Mariana Recamonde-Mendoza