Related papers: Model Validation Using Mutated Training Labels: An…

Empirical Comparison between Cross-Validation and Mutation-Validation in Model Selection

Mutation validation (MV) is a recently proposed approach for model selection, garnering significant interest due to its unique characteristics and potential benefits compared to the widely used cross-validation (CV) method. In this study,…

Machine Learning · Computer Science 2024-07-25 Jinyang Yu , Sami Hamdan , Leonard Sasse , Abigail Morrison , Kaustubh R. Patil

Train on Validation: Squeezing the Data Lemon

Model selection on validation data is an essential step in machine learning. While the mixing of data between training and validation is considered taboo, practitioners often violate it to increase performance. Here, we offer a simple,…

Machine Learning · Statistics 2018-02-19 Guy Tennenholtz , Tom Zahavy , Shie Mannor

Train on Validation (ToV): Fast data selection with applications to fine-tuning

State-of-the-art machine learning often follows a two-stage process: $(i)$~pre-training on large, general-purpose datasets; $(ii)$~fine-tuning on task-specific data. In fine-tuning, selecting training examples that closely reflect the…

Machine Learning · Computer Science 2025-10-02 Ayush Jain , Andrea Montanari , Eren Sasoglu

Fast and Informative Model Selection using Learning Curve Cross-Validation

Common cross-validation (CV) methods like k-fold cross-validation or Monte-Carlo cross-validation estimate the predictive performance of a learner by repeatedly training it on a large portion of the given data and testing on the remaining…

Machine Learning · Computer Science 2021-11-30 Felix Mohr , Jan N. van Rijn

Training and Evaluating with Human Label Variation: An Empirical Study

Human label variation (HLV) challenges the standard assumption that a labelled instance has a single ground truth, instead embracing the natural variation in human annotation to train and evaluate models. While various training methods and…

Machine Learning · Computer Science 2025-10-14 Kemal Kurniawan , Meladel Mistica , Timothy Baldwin , Jey Han Lau

Mitigating Class Boundary Label Uncertainty to Reduce Both Model Bias and Variance

The study of model bias and variance with respect to decision boundaries is critically important in supervised classification. There is generally a tradeoff between the two, as fine-tuning of the decision boundary of a classification model…

Machine Learning · Computer Science 2020-02-25 Matthew Almeida , Wei Ding , Scott Crouter , Ping Chen

Three New Validators and a Large-Scale Benchmark Ranking for Unsupervised Domain Adaptation

Changes to hyperparameters can have a dramatic effect on model accuracy. Thus, the tuning of hyperparameters plays an important role in optimizing machine-learning models. An integral part of the hyperparameter-tuning process is the…

Computer Vision and Pattern Recognition · Computer Science 2023-05-19 Kevin Musgrave , Serge Belongie , Ser-Nam Lim

The Majority Vote Paradigm Shift: When Popular Meets Optimal

Reliably labelling data typically requires annotations from multiple human workers. However, humans are far from being perfect. Hence, it is a common practice to aggregate labels gathered from multiple annotators to make a more confident…

Machine Learning · Statistics 2026-02-17 Antonio Purificato , Maria Sofia Bucarelli , Anil Kumar Nelakanti , Andrea Bacciu , Fabrizio Silvestri , Amin Mantrach

Evaluating of Machine Unlearning: Robustness Verification Without Prior Modifications

Machine unlearning, a process enabling pre-trained models to remove the influence of specific training samples, has attracted significant attention in recent years. While extensive research has focused on developing efficient unlearning…

Cryptography and Security · Computer Science 2024-10-15 Heng Xu , Tianqing Zhu , Wanlei Zhou

Validity Learning on Failures: Mitigating the Distribution Shift in Autonomous Vehicle Planning

The planning problem constitutes a fundamental aspect of the autonomous driving framework. Recent strides in representation learning have empowered vehicles to comprehend their surrounding environments, thereby facilitating the integration…

Robotics · Computer Science 2024-09-25 Fazel Arasteh , Mohammed Elmahgiubi , Behzad Khamidehi , Hamidreza Mirkhani , Weize Zhang , Cao Tongtong , Kasra Rezaee

Hybrid Approach for Inductive Semi Supervised Learning using Label Propagation and Support Vector Machine

Semi supervised learning methods have gained importance in today's world because of large expenses and time involved in labeling the unlabeled data by human experts. The proposed hybrid approach uses SVM and Label Propagation to label the…

Machine Learning · Computer Science 2015-12-08 Aruna Govada , Pravin Joshi , Sahil Mittal , Sanjay K Sahay

Robustness of Accuracy Metric and its Inspirations in Learning with Noisy Labels

For multi-class classification under class-conditional label noise, we prove that the accuracy metric itself can be robust. We concretize this finding's inspiration in two essential aspects: training and validation, with which we address…

Machine Learning · Computer Science 2020-12-09 Pengfei Chen , Junjie Ye , Guangyong Chen , Jingwei Zhao , Pheng-Ann Heng

Measuring Overfitting in Convolutional Neural Networks using Adversarial Perturbations and Label Noise

Although numerous methods to reduce the overfitting of convolutional neural networks (CNNs) exist, it is still not clear how to confidently measure the degree of overfitting. A metric reflecting the overfitting level might be, however,…

Machine Learning · Computer Science 2022-09-28 Svetlana Pavlitskaya , Joël Oswald , J. Marius Zöllner

A Variance Maximization Criterion for Active Learning

Active learning aims to train a classifier as fast as possible with as few labels as possible. The core element in virtually any active learning strategy is the criterion that measures the usefulness of the unlabeled data based on which new…

Machine Learning · Statistics 2018-02-13 Yazhou Yang , Marco Loog

A meta-learning recommender system for hyperparameter tuning: predicting when tuning improves SVM classifiers

For many machine learning algorithms, predictive performance is critically affected by the hyperparameter values used to train them. However, tuning these hyperparameters can come at a high computational cost, especially on larger datasets,…

Machine Learning · Computer Science 2019-06-13 Rafael Gomes Mantovani , André Luis Debiaso Rossi , Edesio Alcobaça , Joaquin Vanschoren , André Carlos Ponce de Leon Ferreira de Carvalho

About Explicit Variance Minimization: Training Neural Networks for Medical Imaging With Limited Data Annotations

Self-supervised learning methods for computer vision have demonstrated the effectiveness of pre-training feature representations, resulting in well-generalizing Deep Neural Networks, even if the annotated data are limited. However,…

Computer Vision and Pattern Recognition · Computer Science 2021-08-25 Dmitrii Shubin , Danny Eytan , Sebastian D. Goodfellow

Efficient, adaptive cross-validation for tuning and comparing models, with application to drug discovery

Cross-validation (CV) is widely used for tuning a model with respect to user-selected parameters and for selecting a "best" model. For example, the method of $k$-nearest neighbors requires the user to choose $k$, the number of neighbors,…

Applications · Statistics 2012-03-01 Hui Shen , William J. Welch , Jacqueline M. Hughes-Oliver

Cross-Validation with Confidence

Cross-validation is one of the most popular model selection methods in statistics and machine learning. Despite its wide applicability, traditional cross validation methods tend to select overfitting models, due to the ignorance of the…

Methodology · Statistics 2017-12-25 Jing Lei

Pseudo-D: Informing Multi-View Uncertainty Estimation with Calibrated Neural Training Dynamics

Computer-aided diagnosis systems must make critical decisions from medical images that are often noisy, ambiguous, or conflicting, yet today's models are trained on overly simplistic labels that ignore diagnostic uncertainty. One-hot labels…

Computer Vision and Pattern Recognition · Computer Science 2025-09-16 Ang Nan Gu , Michael Tsang , Hooman Vaseli , Purang Abolmaesumi , Teresa Tsang

Machine Learning from Explanations

Acquiring and training on large-scale labeled data can be impractical due to cost constraints. Additionally, the use of small training datasets can result in considerable variability in model outcomes, overfitting, and learning of spurious…

Machine Learning · Computer Science 2025-07-08 Jiashu Tao , Reza Shokri