English
Related papers

Related papers: Some Theory For Practical Classifier Validation

200 papers

Model selection on validation data is an essential step in machine learning. While the mixing of data between training and validation is considered taboo, practitioners often violate it to increase performance. Here, we offer a simple,…

Machine Learning · Statistics 2018-02-19 Guy Tennenholtz , Tom Zahavy , Shie Mannor

Cross-validation is widely used for selecting among a family of learning rules. This paper studies a related method, called aggregated hold-out (Agghoo), which mixes cross-validation with aggregation; Agghoo can also be related to bagging.…

Statistics Theory · Mathematics 2017-09-13 Guillaume Maillard , Sylvain Arlot , Matthieu Lerasle

As the main workhorse for model selection, Cross Validation (CV) has achieved an empirical success due to its simplicity and intuitiveness. However, despite its ubiquitous role, CV often falls into the following notorious dilemmas. On the…

Machine Learning · Computer Science 2020-12-29 Weikai Li , Chuanxing Geng , Songcan Chen

Any classifier can be "smoothed out" under Gaussian noise to build a new classifier that is provably robust to $\ell_2$-adversarial perturbations, viz., by averaging its predictions over the noise via randomized smoothing. Under the…

Machine Learning · Computer Science 2022-12-21 Jongheon Jeong , Seojin Kim , Jinwoo Shin

We present a methodology for model evaluation and selection where the sampling mechanism violates the i.i.d. assumption. Our methodology involves a formulation of the bias between the standard Cross-Validation (CV) estimator and the mean…

Methodology · Statistics 2025-03-14 Oren Yuval , Saharon Rosset

State-of-the-art machine learning often follows a two-stage process: $(i)$~pre-training on large, general-purpose datasets; $(ii)$~fine-tuning on task-specific data. In fine-tuning, selecting training examples that closely reflect the…

Machine Learning · Computer Science 2025-10-02 Ayush Jain , Andrea Montanari , Eren Sasoglu

We investigate a problem in which each member of a group of learners is trained separately to solve the same classification task. Each learner has access to a training dataset (possibly with overlap across learners) but each trained…

Machine Learning · Computer Science 2020-03-03 Mahmoud Albardan , John Klein , Olivier Colot

We theoretically analyze and compare the following five popular multiclass classification methods: One vs. All, All Pairs, Tree-based classifiers, Error Correcting Output Codes (ECOC) with randomly generated code matrices, and Multiclass…

Machine Learning · Computer Science 2013-02-19 Amit Daniely , Sivan Sabato , Shai Shalev Shwartz

We show, to our knowledge, the first theoretical treatments of two common questions in cross-validation based hyperparameter selection: (1) After selecting the best hyperparameter using a held-out set, we train the final model using {\em…

Machine Learning · Computer Science 2023-01-13 Parikshit Ram , Alexander G. Gray , Horst C. Samulowitz , Gregory Bramble

[Context] The use of defect prediction models, such as classifiers, can support testing resource allocations by using data of the previous releases of the same project for predicting which software components are likely to be defective. A…

Software Engineering · Computer Science 2020-08-03 Davide Falessi , Jacky Huang , Likhita Narayana , Jennifer Fong Thai , Burak Turhan

A natural method for approximating out-of-sample predictive evaluation is leave-one-out cross-validation (LOOCV) --- we alternately hold out each case from a full data set and then train a Bayesian model using Markov chain Monte Carlo…

Methodology · Statistics 2017-04-28 Longhai Li , Shi Qiu , Bei Zhang , Cindy X. Feng

Pre-validation is a way to build prediction model with two datasets of significantly different feature dimensions. Previous work showed that the asymptotic distribution of the resulting test statistic for the pre-validated predictor…

Methodology · Statistics 2025-05-23 Jing Shang , Sourav Chatterjee , Trevor Hastie , Robert Tibshirani

Training data for text classification is often limited in practice, especially for applications with many output classes or involving many related classification problems. This means classifiers must generalize from limited evidence, but…

Computation and Language · Computer Science 2020-05-19 Abhijit Mahabal , Jason Baldridge , Burcu Karagol Ayan , Vincent Perot , Dan Roth

Cross-validation (CV) is a common method to tune machine learning methods and can be used for model selection in regression as well. Because of the structured nature of small, traditional experimental designs, the literature has warned…

Applications · Statistics 2025-06-18 Maria L. Weese , Byran J. Smucker , David J. Edwards

Cross-Validation (CV) is the default choice for evaluating the performance of machine learning models. Despite its wide usage, their statistical benefits have remained half-understood, especially in challenging nonparametric regimes. In…

Statistics Theory · Mathematics 2024-08-22 Garud Iyengar , Henry Lam , Tianyu Wang

Disagreement-based approaches generate multiple classifiers and exploit the disagreement among them with unlabeled data to improve learning performance. Co-training is a representative paradigm of them, which trains two classifiers…

Machine Learning · Computer Science 2017-08-16 Wei Wang , Zhi-Hua Zhou

Cross-validation is one of the most popular model selection methods in statistics and machine learning. Despite its wide applicability, traditional cross validation methods tend to select overfitting models, due to the ignorance of the…

Methodology · Statistics 2017-12-25 Jing Lei

Robust validation metrics remain essential in contemporary deep learning, not only to detect overfitting and poor generalization, but also to monitor training dynamics. In the supervised classification setting, we investigate whether…

Machine Learning · Computer Science 2025-10-30 Florian A. Hölzl , Daniel Rueckert , Georgios Kaissis

This paper begins with a general theory of error in cross-validation testing of algorithms for supervised learning from examples. It is assumed that the examples are described by attribute-value pairs, where the values are symbolic.…

Machine Learning · Computer Science 2007-05-23 Peter D. Turney

Binary classification is a task that involves the classification of data into one of two distinct classes. It is widely utilized in various fields. However, conventional classifiers tend to make overconfident predictions for data that…

Machine Learning · Computer Science 2025-03-13 Shoma Yokura , Akihisa Ichiki
‹ Prev 1 2 3 10 Next ›