Related papers: Cross-validation Approaches for Multi-study Predic…

Cross-Validation with Confidence

Cross-validation is one of the most popular model selection methods in statistics and machine learning. Despite its wide applicability, traditional cross validation methods tend to select overfitting models, due to the ignorance of the…

Methodology · Statistics 2017-12-25 Jing Lei

Cross-Validation for Nonlinear Mixed Effects Models

Cross-validation is frequently used for model selection in a variety of applications. However, it is difficult to apply cross-validation to mixed effects models (including nonlinear mixed effects models or NLME models) due to the fact that…

Methodology · Statistics 2013-05-24 Emily Colby , Eric Bair

A survey of cross-validation procedures for model selection

Used to estimate the risk of an estimator or to perform model selection, cross-validation is a widespread strategy because of its simplicity and its apparent universality. Many results exist on the model selection performances of…

Statistics Theory · Mathematics 2011-02-01 Sylvain Arlot , Alain Celisse

Cross-Validation, Risk Estimation, and Model Selection

Cross-validation is a popular non-parametric method for evaluating the accuracy of a predictive rule. The usefulness of cross-validation depends on the task we want to employ it for. In this note, I discuss a simple non-parametric setting,…

Methodology · Statistics 2019-09-27 Stefan Wager

Cross-validation: what does it estimate and how well does it do it?

Cross-validation is a widely-used technique to estimate prediction error, but its behavior is complex and not fully understood. Ideally, one would like to think that cross-validation estimates the prediction error for the model at hand, fit…

Methodology · Statistics 2024-03-12 Stephen Bates , Trevor Hastie , Robert Tibshirani

Cross validation approaches for penalized Cox regression

Cross validation is commonly used for selecting tuning parameters in penalized regression, but its use in penalized Cox regression models has received relatively little attention in the literature. Due to its partial likelihood…

Methodology · Statistics 2026-05-13 Biyue Dai , Patrick Breheny

Optimal Ensemble Construction for Multi-Study Prediction with Applications to COVID-19 Excess Mortality Estimation

It is increasingly common to encounter prediction tasks in the biomedical sciences for which multiple datasets are available for model training. Common approaches such as pooling datasets and applying standard statistical learning methods…

Machine Learning · Statistics 2021-10-05 Gabriel Loewinger , Rolando Acosta Nunez , Rahul Mazumder , Giovanni Parmigiani

Cross-validation

This text is a survey on cross-validation. We define all classical cross-validation procedures, and we study their properties for two different goals: estimating the risk of a given estimator, and selecting the best estimator among a given…

Statistics Theory · Mathematics 2017-03-10 Sylvain Arlot

Multi-Study Boosting: Theoretical Considerations for Merging vs. Ensembling

Cross-study replicability is a powerful model evaluation criterion that emphasizes generalizability of predictions. When training cross-study replicable prediction models, it is critical to decide between merging and treating the studies…

Machine Learning · Statistics 2022-07-14 Cathy Shyr , Pragya Sur , Giovanni Parmigiani , Prasad Patil

A Honest Cross-Validation Estimator for Prediction Performance

Cross-validation is a standard tool for obtaining a honest assessment of the performance of a prediction model. The commonly used version repeatedly splits data, trains the prediction model on the training set, evaluates the model…

Machine Learning · Statistics 2025-10-10 Tianyu Pan , Vincent Z. Yu , Viswanath Devanarayan , Lu Tian

Comparing Cluster-Based Cross-Validation Strategies for Machine Learning Model Evaluation

Cross-validation plays a fundamental role in Machine Learning, enabling robust evaluation of model performance and preventing overestimation on training and validation data. However, one of its drawbacks is the potential to create data…

Machine Learning · Computer Science 2025-08-28 Afonso Martini Spezia , Thomas Fontanari , Mariana Recamonde-Mendoza

Theoretical Guarantees of Learning Ensembling Strategies with Applications to Time Series Forecasting

Ensembling is among the most popular tools in machine learning (ML) due to its effectiveness in minimizing variance and thus improving generalization. Most ensembling methods for black-box base learners fall under the umbrella of "stacked…

Machine Learning · Computer Science 2025-12-17 Hilaf Hasson , Danielle C. Maddix , Yuyang Wang , Gaurav Gupta , Youngsuk Park

Distributional bias compromises leave-one-out cross-validation

Cross-validation is a common method for estimating the predictive performance of machine learning models. In a data-scarce regime, where one typically wishes to maximize the number of instances used for training the model, an approach…

Methodology · Statistics 2025-03-25 George I. Austin , Itsik Pe'er , Tal Korem

Cross-conformal predictors

This note introduces the method of cross-conformal prediction, which is a hybrid of the methods of inductive conformal prediction and cross-validation, and studies its validity and predictive efficiency empirically.

Machine Learning · Statistics 2012-08-06 Vladimir Vovk

Risk-consistency of cross-validation with lasso-type procedures

The lasso and related sparsity inducing algorithms have been the target of substantial theoretical and applied research. Correspondingly, many results are known about their behavior for a fixed or optimally chosen tuning parameter specified…

Statistics Theory · Mathematics 2016-06-23 Darren Homrighausen , Daniel J. McDonald

Efficient algorithms for decision tree cross-validation

Cross-validation is a useful and generally applicable technique often employed in machine learning, including decision tree induction. An important disadvantage of straightforward implementation of the technique is its computational…

Machine Learning · Computer Science 2007-05-23 Hendrik Blockeel , Jan Struyf

A Bayes interpretation of stacking for M-complete and M-open settings

In M-open problems where no true model can be conceptualized, it is common to back off from modeling and merely seek good prediction. Even in M-complete problems, taking a predictive approach can be very useful. Stacking is a model…

Statistics Theory · Mathematics 2016-02-17 Tri Le , Bertrand Clarke

Recursive Partitioning for Heterogeneous Causal Effects

In this paper we study the problems of estimating heterogeneity in causal effects in experimental or observational studies and conducting inference about the magnitude of the differences in treatment effects across subsets of the…

Machine Learning · Statistics 2022-06-08 Susan Athey , Guido Imbens

Cross-Balancing for Data-Informed Design and Efficient Analysis of Observational Studies

Causal inference starts with a simple idea: compare groups that differ by treatment, not much else. Traditionally, similar groups are constructed using only observed covariates; however, it remains a long-standing challenge to incorporate…

Methodology · Statistics 2025-11-21 Ying Jin , José Zubizarreta

Cross-Validation and Uncertainty Determination for Randomized Neural Networks with Applications to Mobile Sensors

Randomized artificial neural networks such as extreme learning machines provide an attractive and efficient method for supervised learning under limited computing ressources and green machine learning. This especially applies when equipping…

Machine Learning · Statistics 2022-01-02 Ansgar Steland , Bart E. Pieters