Related papers: Some Theory For Practical Classifier Validation

Train on Validation: Squeezing the Data Lemon

Model selection on validation data is an essential step in machine learning. While the mixing of data between training and validation is considered taboo, practitioners often violate it to increase performance. Here, we offer a simple,…

Machine Learning · Statistics 2018-02-19 Guy Tennenholtz , Tom Zahavy , Shie Mannor

Cross-validation improved by aggregation: Agghoo

Cross-validation is widely used for selecting among a family of learning rules. This paper studies a related method, called aggregated hold-out (Agghoo), which mixes cross-validation with aggregation; Agghoo can also be related to bagging.…

Statistics Theory · Mathematics 2017-09-13 Guillaume Maillard , Sylvain Arlot , Matthieu Lerasle

Leave Zero Out: Towards a No-Cross-Validation Approach for Model Selection

As the main workhorse for model selection, Cross Validation (CV) has achieved an empirical success due to its simplicity and intuitiveness. However, despite its ubiquitous role, CV often falls into the following notorious dilemmas. On the…

Machine Learning · Computer Science 2020-12-29 Weikai Li , Chuanxing Geng , Songcan Chen

Confidence-aware Training of Smoothed Classifiers for Certified Robustness

Any classifier can be "smoothed out" under Gaussian noise to build a new classifier that is provably robust to $\ell_2$-adversarial perturbations, viz., by averaging its predictions over the noise via randomized smoothing. Under the…

Machine Learning · Computer Science 2022-12-21 Jongheon Jeong , Seojin Kim , Jinwoo Shin

Cross Validation for Correlated Data in Regression and Classification Models, with Applications to Deep Learning

We present a methodology for model evaluation and selection where the sampling mechanism violates the i.i.d. assumption. Our methodology involves a formulation of the bias between the standard Cross-Validation (CV) estimator and the mean…

Methodology · Statistics 2025-03-14 Oren Yuval , Saharon Rosset

Train on Validation (ToV): Fast data selection with applications to fine-tuning

State-of-the-art machine learning often follows a two-stage process: $(i)$~pre-training on large, general-purpose datasets; $(ii)$~fine-tuning on task-specific data. In fine-tuning, selecting training examples that closely reflect the…

Machine Learning · Computer Science 2025-10-02 Ayush Jain , Andrea Montanari , Eren Sasoglu

SPOCC: Scalable POssibilistic Classifier Combination -- toward robust aggregation of classifiers

We investigate a problem in which each member of a group of learners is trained separately to solve the same classification task. Each learner has access to a training dataset (possibly with overlap across learners) but each trained…

Machine Learning · Computer Science 2020-03-03 Mahmoud Albardan , John Klein , Olivier Colot

Multiclass Learning Approaches: A Theoretical Comparison with Implications

We theoretically analyze and compare the following five popular multiclass classification methods: One vs. All, All Pairs, Tree-based classifiers, Error Correcting Output Codes (ECOC) with randomly generated code matrices, and Multiclass…

Machine Learning · Computer Science 2013-02-19 Amit Daniely , Sivan Sabato , Shai Shalev Shwartz

Toward Theoretical Guidance for Two Common Questions in Practical Cross-Validation based Hyperparameter Selection

We show, to our knowledge, the first theoretical treatments of two common questions in cross-validation based hyperparameter selection: (1) After selecting the best hyperparameter using a held-out set, we train the final model using {\em…

Machine Learning · Computer Science 2023-01-13 Parikshit Ram , Alexander G. Gray , Horst C. Samulowitz , Gregory Bramble

On the Need of Preserving Order of Data When Validating Within-Project Defect Classifiers

[Context] The use of defect prediction models, such as classifiers, can support testing resource allocations by using data of the previous releases of the same project for predicting which software components are likely to be defective. A…

Software Engineering · Computer Science 2020-08-03 Davide Falessi , Jacky Huang , Likhita Narayana , Jennifer Fong Thai , Burak Turhan

Approximating Cross-validatory Predictive Evaluation in Bayesian Latent Variables Models with Integrated IS and WAIC

A natural method for approximating out-of-sample predictive evaluation is leave-one-out cross-validation (LOOCV) --- we alternately hold out each case from a full data set and then train a Bayesian model using Markov chain Monte Carlo…

Methodology · Statistics 2017-04-28 Longhai Li , Shi Qiu , Bei Zhang , Cindy X. Feng

Pre-validation Revisited

Pre-validation is a way to build prediction model with two datasets of significantly different feature dimensions. Previous work showed that the asymptotic distribution of the resulting test statistic for the pre-validated predictor…

Methodology · Statistics 2025-05-23 Jing Shang , Sourav Chatterjee , Trevor Hastie , Robert Tibshirani

Text Classification with Few Examples using Controlled Generalization

Training data for text classification is often limited in practice, especially for applications with many output classes or involving many related classification problems. This means classifiers must generalize from limited evidence, but…

Computation and Language · Computer Science 2020-05-19 Abhijit Mahabal , Jason Baldridge , Burcu Karagol Ayan , Vincent Perot , Dan Roth

The use of cross validation in the analysis of designed experiments

Cross-validation (CV) is a common method to tune machine learning methods and can be used for model selection in regression as well. Because of the structured nature of small, traditional experimental designs, the literature has warned…

Applications · Statistics 2025-06-18 Maria L. Weese , Byran J. Smucker , David J. Edwards

Is Cross-Validation the Gold Standard to Evaluate Model Performance?

Cross-Validation (CV) is the default choice for evaluating the performance of machine learning models. Despite its wide usage, their statistical benefits have remained half-understood, especially in challenging nonparametric regimes. In…

Statistics Theory · Mathematics 2024-08-22 Garud Iyengar , Henry Lam , Tianyu Wang

Theoretical Foundation of Co-Training and Disagreement-Based Algorithms

Disagreement-based approaches generate multiple classifiers and exploit the disagreement among them with unlabeled data to improve learning performance. Co-training is a representative paradigm of them, which trains two classifiers…

Machine Learning · Computer Science 2017-08-16 Wei Wang , Zhi-Hua Zhou

Cross-Validation with Confidence

Cross-validation is one of the most popular model selection methods in statistics and machine learning. Despite its wide applicability, traditional cross validation methods tend to select overfitting models, due to the ignorance of the…

Methodology · Statistics 2017-12-25 Jing Lei

Gradient-Weight Alignment as a Train-Time Proxy for Generalization in Classification Tasks

Robust validation metrics remain essential in contemporary deep learning, not only to detect overfitting and poor generalization, but also to monitor training dynamics. In the supervised classification setting, we investigate whether…

Machine Learning · Computer Science 2025-10-30 Florian A. Hölzl , Daniel Rueckert , Georgios Kaissis

Theoretical Analyses of Cross-Validation Error and Voting in Instance-Based Learning

This paper begins with a general theory of error in cross-validation testing of algorithms for supervised learning from examples. It is assumed that the examples are described by attribute-value pairs, where the values are symbolic.…

Machine Learning · Computer Science 2007-05-23 Peter D. Turney

A method for classification of data with uncertainty using hypothesis testing

Binary classification is a task that involves the classification of data into one of two distinct classes. It is widely utilized in various fields. However, conventional classifiers tend to make overconfident predictions for data that…

Machine Learning · Computer Science 2025-03-13 Shoma Yokura , Akihisa Ichiki