Related papers: Test Error Estimation after Model Selection Using …

A bias correction for the minimum error rate in cross-validation

Tuning parameters in supervised learning problems are often estimated by cross-validation. The minimum value of the cross-validation error can be biased downward as an estimate of the test error at that same value of the tuning parameter.…

Applications · Statistics 2009-08-21 Ryan J. Tibshirani , Robert Tibshirani

Bootstrapping the Cross-Validation Estimate

Cross-validation is a widely used technique for evaluating the performance of prediction models, ranging from simple binary classification to complex precision medicine strategies. It helps correct for optimism bias in error estimates,…

Methodology · Statistics 2025-09-05 Bryan Cai , Yuanhui Luo , Xinzhou Guo , Fabio Pellegrini , Menglan Pang , Carl de Moor , Changyu Shen , Vivek Charu , Lu Tian

Cross-validation: what does it estimate and how well does it do it?

Cross-validation is a widely-used technique to estimate prediction error, but its behavior is complex and not fully understood. Ideally, one would like to think that cross-validation estimates the prediction error for the model at hand, fit…

Methodology · Statistics 2024-03-12 Stephen Bates , Trevor Hastie , Robert Tibshirani

Selective inference after cross-validation

This paper describes a method for performing inference on models chosen by cross-validation. When the test error being minimized in cross-validation is a residual sum of squares it can be written as a quadratic form. This allows us to apply…

Methodology · Statistics 2015-12-01 Joshua R. Loftus

Model selection by cross-validation in an expectile linear regression

For linear models that may have asymmetric errors, we study variable selection by cross-validation. The data are split into training and validation sets, with the number of observations in the validation set much larger than in the training…

Methodology · Statistics 2026-01-16 Bilel Bousselmi , Gabriela Ciuperca

Model selection for estimation of causal parameters

A popular technique for selecting and tuning machine learning estimators is cross-validation. Cross-validation evaluates overall model fit, usually in terms of predictive accuracy. In causal inference, the optimal choice of estimator…

Methodology · Statistics 2021-07-07 Dominik Rothenhäusler

A Honest Cross-Validation Estimator for Prediction Performance

Cross-validation is a standard tool for obtaining a honest assessment of the performance of a prediction model. The commonly used version repeatedly splits data, trains the prediction model on the training set, evaluates the model…

Machine Learning · Statistics 2025-10-10 Tianyu Pan , Vincent Z. Yu , Viswanath Devanarayan , Lu Tian

Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning

The correct use of model evaluation, model selection, and algorithm selection techniques is vital in academic machine learning research as well as in many industrial settings. This article reviews different techniques that can be used for…

Machine Learning · Computer Science 2020-11-12 Sebastian Raschka

Sample Selection Bias Correction Theory

This paper presents a theoretical analysis of sample selection bias correction. The sample bias correction technique commonly used in machine learning consists of reweighting the cost of an error on each training point of a biased sample to…

Machine Learning · Computer Science 2008-12-18 Corinna Cortes , Mehryar Mohri , Michael Riley , Afshin Rostamizadeh

An Algorithmic Framework for Computing Validation Performance Bounds by Using Suboptimal Models

Practical model building processes are often time-consuming because many different models must be trained and validated. In this paper, we introduce a novel algorithm that can be used for computing the lower and the upper bounds of model…

Machine Learning · Statistics 2014-02-11 Yoshiki Suzuki , Kohei Ogawa , Yuki Shinmura , Ichiro Takeuchi

Improving prediction accuracy by choosing resampling distribution via cross-validation

In a regression model, prediction is typically performed after model selection. The large variability in the model selection makes the prediction unstable. Thus, it is essential to reduce the variability in model selection and improve…

Computation · Statistics 2024-04-11 Wataru Yoshida , Kei Hirose

Estimation of prediction error with known covariate shift

In supervised learning, the estimation of prediction error on unlabeled test data is an important task. Existing methods are usually built on the assumption that the training and test data are sampled from the same distribution, which is…

Methodology · Statistics 2022-09-30 Hui Xu , Robert Tibshirani

Post-Selection Confidence Bounds for Prediction Performance

In machine learning, the selection of a promising model from a potentially large number of competing models and the assessment of its generalization performance are critical tasks that need careful consideration. Typically, model selection…

Machine Learning · Statistics 2023-02-06 Pascal Rink , Werner Brannath

Blinded sample size re-estimation accounting for uncertainty in mid-trial estimation

For randomized controlled trials to be conclusive, it is important to set the target sample size accurately at the design stage. Comparing two normal populations, the sample size calculation requires specification of the variance other than…

Methodology · Statistics 2026-02-04 Hirotada Maeda , Satoshi Hattori , Tim Friede

An analysis of the cost of hyper-parameter selection via split-sample validation, with applications to penalized regression

In the regression setting, given a set of hyper-parameters, a model-estimation procedure constructs a model from training data. The optimal hyper-parameters that minimize generalization error of the model are usually unknown. In practice…

Machine Learning · Statistics 2019-04-01 Jean Feng , Noah Simon

Unbiased Test Error Estimation in the Poisson Means Problem via Coupled Bootstrap Techniques

We propose a coupled bootstrap (CB) method for the test error of an arbitrary algorithm that estimates the mean in a Poisson sequence, often called the Poisson means problem. The idea behind our method is to generate two carefully-designed…

Methodology · Statistics 2024-08-20 Natalia L. Oliveira , Jing Lei , Ryan J. Tibshirani

On confidence intervals centered on bootstrap smoothed estimators

Bootstrap smoothed (bagged) estimators have been proposed as an improvement on estimators found after preliminary data-based model selection. Efron, 2014, derived a widely applicable formula for a delta method approximation to the standard…

Methodology · Statistics 2019-07-11 Paul Kabaila , Christeen Wijethunga

Efficient estimation and correction of selection-induced bias with order statistics

Model selection aims to identify a sufficiently well performing model that is possibly simpler than the most complex model among a pool of candidates. However, the decision-making process itself can inadvertently introduce non-negligible…

Methodology · Statistics 2024-08-08 Yann McLatchie , Aki Vehtari

Bootstrapping and Sample Splitting For High-Dimensional, Assumption-Free Inference

Several new methods have been proposed for performing valid inference after model selection. An older method is sampling splitting: use part of the data for model selection and part for inference. In this paper we revisit sample splitting…

Statistics Theory · Mathematics 2018-04-04 Alessandro Rinaldo , Larry Wasserman , Max G'Sell , Jing Lei

Cross-Validation with Confidence

Cross-validation is one of the most popular model selection methods in statistics and machine learning. Despite its wide applicability, traditional cross validation methods tend to select overfitting models, due to the ignorance of the…

Methodology · Statistics 2017-12-25 Jing Lei