Related papers: A study of pre-validation

Pre-validation Revisited

Pre-validation is a way to build prediction model with two datasets of significantly different feature dimensions. Previous work showed that the asymptotic distribution of the resulting test statistic for the pre-validated predictor…

Methodology · Statistics 2025-05-23 Jing Shang , Sourav Chatterjee , Trevor Hastie , Robert Tibshirani

Cross-Validation, Risk Estimation, and Model Selection

Cross-validation is a popular non-parametric method for evaluating the accuracy of a predictive rule. The usefulness of cross-validation depends on the task we want to employ it for. In this note, I discuss a simple non-parametric setting,…

Methodology · Statistics 2019-09-27 Stefan Wager

Conditional independence testing: a predictive perspective

Conditional independence testing is a key problem required by many machine learning and statistics tools. In particular, it is one way of evaluating the usefulness of some features on a supervised prediction problem. We propose a novel…

Machine Learning · Statistics 2019-08-02 Marco Henrique de Almeida Inácio , Rafael Izbicki , Rafael Bassi Stern

A Permutation Test on Complex Sample Data

Permutation tests are a distribution free way of performing hypothesis tests. These tests rely on the condition that the observed data are exchangeable among the groups being tested under the null hypothesis. This assumption is easily…

Methodology · Statistics 2017-12-14 Daniell Toth

Prepivoted permutation tests

We present a general approach to constructing permutation tests that are both exact for the null hypothesis of equality of distributions and asymptotically correct for testing equality of parameters of distributions while allowing the…

Statistics Theory · Mathematics 2021-07-12 Colin B. Fogarty

Cross-Validation with Confidence

Cross-validation is one of the most popular model selection methods in statistics and machine learning. Despite its wide applicability, traditional cross validation methods tend to select overfitting models, due to the ignorance of the…

Methodology · Statistics 2017-12-25 Jing Lei

Evaluating Fairness Using Permutation Tests

Machine learning models are central to people's lives and impact society in ways as fundamental as determining how people access information. The gravity of these models imparts a responsibility to model developers to ensure that they are…

Applications · Statistics 2020-07-13 Cyrus DiCiccio , Sriram Vasudevan , Kinjal Basu , Krishnaram Kenthapadi , Deepak Agarwal

Permutation testing in high-dimensional linear models: an empirical investigation

Permutation testing in linear models, where the number of nuisance coefficients is smaller than the sample size, is a well-studied topic. The common approach of such tests is to permute residuals after regressing on the nuisance covariates.…

Methodology · Statistics 2020-10-09 Jesse Hemerik , Magne Thoresen , Livio Finos

On the cross-validation bias due to unsupervised pre-processing

Cross-validation is the de facto standard for predictive model evaluation and selection. In proper use, it provides an unbiased estimate of a model's predictive performance. However, data sets often undergo various forms of data-dependent…

Methodology · Statistics 2023-01-18 Amit Moscovich , Saharon Rosset

Cross-validation

This text is a survey on cross-validation. We define all classical cross-validation procedures, and we study their properties for two different goals: estimating the risk of a given estimator, and selecting the best estimator among a given…

Statistics Theory · Mathematics 2017-03-10 Sylvain Arlot

Permutation-based Hypothesis Testing for Neural Networks

Neural networks are powerful predictive models, but they provide little insight into the nature of relationships between predictors and outcomes. Although numerous methods have been proposed to quantify the relative contributions of input…

Methodology · Statistics 2023-01-30 Francesca Mandel , Ian Barnett

Comparison of predictive values with paired samples

Positive predictive value and negative predictive value are two widely used parameters to assess the clinical usefulness of a medical diagnostic test. When there are two diagnostic tests, it is recommendable to make a comparative assessment…

Methodology · Statistics 2024-05-29 Antonio Martín Andrés , Pedro Femia Marzo

Hypothesis testing at the extremes: fast and robust association for high-throughput data

A number of biomedical problems require performing many hypothesis tests, with an attendant need to apply stringent thresholds. Often the data take the form of a series of predictor vectors, each of which must be compared with a single…

Methodology · Statistics 2014-05-13 Yi-Hui Zhou , Fred Wright

Correcting for selection bias via cross-validation in the classification of microarray data

There is increasing interest in the use of diagnostic rules based on microarray data. These rules are formed by considering the expression levels of thousands of genes in tissue samples taken on patients of known classification with respect…

Statistics Theory · Mathematics 2008-12-18 G. J. McLachlan , J. Chevelu , J. Zhu

Efficient Estimation of the Maximal Association between Multiple Predictors and a Survival Outcome

This paper develops a new approach to post-selection inference for screening high-dimensional predictors of survival outcomes. Post-selection inference for right-censored outcome data has been investigated in the literature, but much…

Methodology · Statistics 2021-12-22 Tzu-Jung Huang , Alex Luedtke , Ian W. McKeague

A Honest Cross-Validation Estimator for Prediction Performance

Cross-validation is a standard tool for obtaining a honest assessment of the performance of a prediction model. The commonly used version repeatedly splits data, trains the prediction model on the training set, evaluates the model…

Machine Learning · Statistics 2025-10-10 Tianyu Pan , Vincent Z. Yu , Viswanath Devanarayan , Lu Tian

Confidence regions for univariate and multivariate data using permutation tests

Confidence intervals are central to statistical inference as a tool to evaluate the type I error risk at a given significance level. We devise a method to construct confidence intervals using a single run of a permutation test. This…

Methodology · Statistics 2022-06-22 Niels Lundtorp Olsen

Falsifying Predictive Algorithm

Empirical investigations into unintended model behavior often show that the algorithm is predicting another outcome than what was intended. These exposes highlight the need to identify when algorithms predict unintended quantities - ideally…

Methodology · Statistics 2026-01-27 Amanda Coston

Permutation Testing for Dependence in Time Series

Given observations from a stationary time series, permutation tests allow one to construct exactly level $\alpha$ tests under the null hypothesis of an i.i.d. (or, more generally, exchangeable) distribution. On the other hand, when the null…

Statistics Theory · Mathematics 2020-09-09 Joseph P. Romano , Marius A. Tirlea

A survey of cross-validation procedures for model selection

Used to estimate the risk of an estimator or to perform model selection, cross-validation is a widespread strategy because of its simplicity and its apparent universality. Many results exist on the model selection performances of…

Statistics Theory · Mathematics 2011-02-01 Sylvain Arlot , Alain Celisse