Related papers: Diversified Batch Selection for Training Accelerat…

Reinforced Data Sampling for Model Diversification

With the rising number of machine learning competitions, the world has witnessed an exciting race for the best algorithms. However, the involved data selection process may fundamentally suffer from evidence ambiguity and concept drift…

Machine Learning · Computer Science 2020-06-15 Hoang D. Nguyen , Xuan-Son Vu , Quoc-Tuan Truong , Duc-Trong Le

Measuring Discrimination to Boost Comparative Testing for Multiple Deep Learning Models

The boom of DL technology leads to massive DL models built and shared, which facilitates the acquisition and reuse of DL models. For a given task, we encounter multiple DL models available with the same functionality, which are considered…

Software Engineering · Computer Science 2021-03-10 Linghan Meng , Yanhui Li , Lin Chen , Zhi Wang , Di Wu , Yuming Zhou , Baowen Xu

Differential-informed Sample Selection Accelerates Multimodal Contrastive Learning

The remarkable success of contrastive-learning-based multimodal models has been greatly driven by training on ever-larger datasets with expensive compute consumption. Sample selection as an alternative efficient paradigm plays an important…

Computer Vision and Pattern Recognition · Computer Science 2025-07-18 Zihua Zhao , Feng Hong , Mengxi Chen , Pengyi Chen , Benyuan Liu , Jiangchao Yao , Ya Zhang , Yanfeng Wang

Free Lunch for Pass@$k$? Low Cost Diverse Sampling for Diffusion Language Models

Diverse outputs in text generation are necessary for effective exploration in complex reasoning tasks, such as code generation and mathematical problem solving. Such Pass@$k$ problems benefit from distinct candidates covering the solution…

Computation and Language · Computer Science 2026-03-06 Sean Lamont , Christian Walder , Paul Montague , Amir Dezfouli , Michael Norrish

Learning From Less Data: Diversified Subset Selection and Active Learning in Image Classification Tasks

Supervised machine learning based state-of-the-art computer vision techniques are in general data hungry and pose the challenges of not having adequate computing resources and of high costs involved in human labeling efforts. Training data…

Computer Vision and Pattern Recognition · Computer Science 2018-05-30 Vishal Kaushal , Anurag Sahoo , Khoshrav Doctor , Narasimha Raju , Suyash Shetty , Pankaj Singh , Rishabh Iyer , Ganesh Ramakrishnan

Multi-Label Adaptive Batch Selection by Highlighting Hard and Imbalanced Samples

Deep neural network models have demonstrated their effectiveness in classifying multi-label data from various domains. Typically, they employ a training mode that combines mini-batches with optimizers, where each sample is randomly selected…

Machine Learning · Computer Science 2024-03-28 Ao Zhou , Bin Liu , Jin Wang , Grigorios Tsoumakas

Differentiable Model Selection for Ensemble Learning

Model selection is a strategy aimed at creating accurate and robust models. A key challenge in designing these algorithms is identifying the optimal model for classifying any particular input sample. This paper addresses this challenge and…

Machine Learning · Computer Science 2023-05-22 James Kotary , Vincenzo Di Vito , Ferdinando Fioretto

Diffusing DeBias: Synthetic Bias Amplification for Model Debiasing

Deep learning model effectiveness in classification tasks is often challenged by the quality and quantity of training data whenever they are affected by strong spurious correlations between specific attributes and target labels. This…

Machine Learning · Computer Science 2025-10-27 Massimiliano Ciranni , Vito Paolo Pastore , Roberto Di Via , Enzo Tartaglione , Francesca Odone , Vittorio Murino

Diverse mini-batch Active Learning

We study the problem of reducing the amount of labeled training data required to train supervised classification models. We approach it by leveraging Active Learning, through sequential selection of examples which benefit the model most.…

Machine Learning · Computer Science 2019-01-18 Fedor Zhdanov

Multimodal-Guided Dynamic Dataset Pruning for Robust and Efficient Data-Centric Learning

Modern deep models are trained on large real-world datasets, where data quality varies and redundancy is common. Data-centric approaches such as dataset pruning have shown promise in improving training efficiency and model performance.…

Machine Learning · Computer Science 2025-07-18 Suorong Yang , Peijia Li , Yujie Liu , Zhiming Xu , Peng Ye , Wanli Ouyang , Furao Shen , Dongzhan Zhou

Towards Accelerated Model Training via Bayesian Data Selection

Mislabeled, duplicated, or biased data in real-world scenarios can lead to prolonged training and even hinder model convergence. Traditional solutions prioritizing easy or hard samples lack the flexibility to handle such a variety…

Machine Learning · Computer Science 2023-11-08 Zhijie Deng , Peng Cui , Jun Zhu

Batch Active Learning at Scale

The ability to train complex and highly effective models often requires an abundance of training data, which can easily become a bottleneck in cost, time, and computational resources. Batch active learning, which adaptively issues batched…

Machine Learning · Computer Science 2021-08-02 Gui Citovsky , Giulia DeSalvo , Claudio Gentile , Lazaros Karydas , Anand Rajagopalan , Afshin Rostamizadeh , Sanjiv Kumar

Rethinking Representativeness and Diversity in Dynamic Data Selection

Dynamic data selection accelerates training by sampling a changing subset of the dataset while preserving accuracy. We rethink two core notions underlying sample evaluation: representativeness and diversity. Instead of local geometric…

Artificial Intelligence · Computer Science 2026-03-06 Yuzhe Zhou , Zhenglin Hua , Haiyun Guo , Yuheng Jia

Batch Active Learning Using Determinantal Point Processes

Data collection and labeling is one of the main challenges in employing machine learning algorithms in a variety of real-world applications with limited data. While active learning methods attempt to tackle this issue by labeling only the…

Machine Learning · Computer Science 2019-06-20 Erdem Bıyık , Kenneth Wang , Nima Anari , Dorsa Sadigh

RL-Selector: Reinforcement Learning-Guided Data Selection via Redundancy Assessment

Modern deep architectures often rely on large-scale datasets, but training on these datasets incurs high computational and storage overhead. Real-world datasets often contain substantial redundancies, prompting the need for more…

Machine Learning · Computer Science 2025-06-27 Suorong Yang , Peijia Li , Furao Shen , Jian Zhao

Learning From Less Data: A Unified Data Subset Selection and Active Learning Framework for Computer Vision

Supervised machine learning based state-of-the-art computer vision techniques are in general data hungry. Their data curation poses the challenges of expensive human labeling, inadequate computing resources and larger experiment turn around…

Computer Vision and Pattern Recognition · Computer Science 2019-01-07 Vishal Kaushal , Rishabh Iyer , Suraj Kothawade , Rohan Mahadev , Khoshrav Doctor , Ganesh Ramakrishnan

Diversify and Conquer: Diversity-Centric Data Selection with Iterative Refinement

Finetuning large language models on instruction data is crucial for enhancing pre-trained knowledge and improving instruction-following capabilities. As instruction datasets proliferate, selecting optimal data for effective training becomes…

Computation and Language · Computer Science 2024-09-18 Simon Yu , Liangyu Chen , Sara Ahmadian , Marzieh Fadaee

Diversity-Aware Batch-Mode Active Learning for Efficient Sampling in Data-Driven Constitutive Modeling

The constitutive behavior of materials is modeled through relationships between stress, strain, and possibly additional internal variables. This results in relatively high-dimensional feature spaces for machine learning models rendering the…

Computational Physics · Physics 2026-05-20 Ronak Shoghi , Lukas Morand , Dirk Helm , Alexander Hartmaier

OASIS: Online Sample Selection for Continual Visual Instruction Tuning

In continual instruction tuning (CIT) scenarios, where new instruction tuning data continuously arrive in an online streaming manner, training delays from large-scale data significantly hinder real-time adaptation. Data selection can…

Computer Vision and Pattern Recognition · Computer Science 2025-10-10 Minjae Lee , Minhyuk Seo , Tingyu Qu , Tinne Tuytelaars , Jonghyun Choi

A Diffusion Model Framework for Unsupervised Neural Combinatorial Optimization

Learning to sample from intractable distributions over discrete sets without relying on corresponding training data is a central problem in a wide range of fields, including Combinatorial Optimization. Currently, popular deep learning-based…

Machine Learning · Computer Science 2025-08-25 Sebastian Sanokowski , Sepp Hochreiter , Sebastian Lehner