Related papers: Cold-start Active Learning through Self-supervised…

PT4AL: Using Self-Supervised Pretext Tasks for Active Learning

Labeling a large set of data is expensive. Active learning aims to tackle this problem by asking to annotate only the most informative data from the unlabeled set. We propose a novel active learning approach that utilizes self-supervised…

Computer Vision and Pattern Recognition · Computer Science 2022-07-27 John Seon Keun Yi , Minseok Seo , Jongchan Park , Dong-Geol Choi

Active learning for reducing labeling effort in text classification tasks

Labeling data can be an expensive task as it is usually performed manually by domain experts. This is cumbersome for deep learning, as it is dependent on large labeled datasets. Active learning (AL) is a paradigm that aims to reduce…

Computation and Language · Computer Science 2021-11-05 Pieter Floris Jacobs , Gideon Maillette de Buy Wenniger , Marco Wiering , Lambert Schomaker

Reducing Label Effort: Self-Supervised meets Active Learning

Active learning is a paradigm aimed at reducing the annotation effort by training the model on actively selected informative and/or representative samples. Another paradigm to reduce the annotation effort is self-training that learns from a…

Computer Vision and Pattern Recognition · Computer Science 2021-08-27 Javad Zolfaghari Bengar , Joost van de Weijer , Bartlomiej Twardowski , Bogdan Raducanu

Fine-tuning BERT for Low-Resource Natural Language Understanding via Active Learning

Recently, leveraging pre-trained Transformer based language models in down stream, task specific models has advanced state of the art results in natural language understanding tasks. However, only a little research has explored the…

Computation and Language · Computer Science 2020-12-07 Daniel Grießhaber , Johannes Maucher , Ngoc Thang Vu

Towards Efficient Active Learning in NLP via Pretrained Representations

Fine-tuning Large Language Models (LLMs) is now a common approach for text classification in a wide range of applications. When labeled documents are scarce, active learning helps save annotation efforts but requires retraining of massive…

Machine Learning · Computer Science 2024-02-27 Artem Vysogorets , Achintya Gopal

Self-Training for Sample-Efficient Active Learning for Text Classification with Pre-Trained Language Models

Active learning is an iterative labeling process that is used to obtain a small labeled subset, despite the absence of labeled data, thereby enabling to train a model for supervised tasks such as text classification. While active learning…

Computation and Language · Computer Science 2024-10-07 Christopher Schröder , Gerhard Heyer

Select, Label, Evaluate: Active Testing in NLP

Human annotation cost and time remain significant bottlenecks in Natural Language Processing (NLP), with test data annotation being particularly expensive due to the stringent requirement for low-error and high-quality labels necessary for…

Computation and Language · Computer Science 2026-03-24 Antonio Purificato , Maria Sofia Bucarelli , Andrea Bacciu , Amin Mantrach , Fabrizio Silvestri

Cold Start Active Learning Strategies in the Context of Imbalanced Classification

We present novel active learning strategies dedicated to providing a solution to the cold start stage, i.e. initializing the classification of a large set of data with no attached labels. Moreover, proposed strategies are designed to handle…

Machine Learning · Computer Science 2022-01-26 Etienne Brangbour , Pierrick Bruneau , Thomas Tamisier , Stéphane Marchand-Maillet

Pre-trained Language Model Based Active Learning for Sentence Matching

Active learning is able to significantly reduce the annotation cost for data-driven techniques. However, previous active learning approaches for natural language processing mainly depend on the entropy-based uncertainty criterion, and…

Computation and Language · Computer Science 2020-10-13 Guirong Bai , Shizhu He , Kang Liu , Jun Zhao , Zaiqing Nie

Boosting Active Learning for Speech Recognition with Noisy Pseudo-labeled Samples

The cost of annotating transcriptions for large speech corpora becomes a bottleneck to maximally enjoy the potential capacity of deep neural network-based automatic speech recognition models. In this paper, we present a new training…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-06 Jihwan Bang , Heesu Kim , YoungJoon Yoo , Jung-Woo Ha

Cold-Start Active Preference Learning in Socio-Economic Domains

Active preference learning offers an efficient approach to modeling preferences, but it is hindered by the cold-start problem, which leads to a marked decline in performance when no initial labeled data are available. While cold-start…

Machine Learning · Computer Science 2025-11-04 Mojtaba Fayaz-Bakhsh , Danial Ataee , MohammadAmin Fazli

Exemplar Guided Active Learning

We consider the problem of wisely using a limited budget to label a small subset of a large unlabeled dataset. We are motivated by the NLP problem of word sense disambiguation. For any word, we have a set of candidate labels from a…

Machine Learning · Computer Science 2020-11-04 Jason Hartford , Kevin Leyton-Brown , Hadas Raviv , Dan Padnos , Shahar Lev , Barak Lenz

Active Learning: Problem Settings and Recent Developments

In supervised learning, acquiring labeled training data for a predictive model can be very costly, but acquiring a large amount of unlabeled data is often quite easy. Active learning is a method of obtaining predictive models with high…

Machine Learning · Computer Science 2020-12-17 Hideitsu Hino

ActiveLLM: Large Language Model-based Active Learning for Textual Few-Shot Scenarios

Active learning is designed to minimize annotation efforts by prioritizing instances that most enhance learning. However, many active learning strategies struggle with a `cold-start' problem, needing substantial initial data to be…

Computation and Language · Computer Science 2026-01-14 Markus Bayer , Justin Lutz , Christian Reuter

Active Testing: Sample-Efficient Model Evaluation

We introduce a new framework for sample-efficient model evaluation that we call active testing. While approaches like active learning reduce the number of labels needed for model training, existing literature largely ignores the cost of…

Machine Learning · Statistics 2021-06-15 Jannik Kossen , Sebastian Farquhar , Yarin Gal , Tom Rainforth

Learning to Sample: an Active Learning Framework

Meta-learning algorithms for active learning are emerging as a promising paradigm for learning the ``best'' active learning strategy. However, current learning-based active learning approaches still require sufficient training data so as to…

Machine Learning · Computer Science 2019-09-10 Jingyu Shao , Qing Wang , Fangbing Liu

Active Learning for Natural Language Generation

The field of Natural Language Generation (NLG) suffers from a severe shortage of labeled data due to the extremely expensive and time-consuming process involved in manual annotation. A natural approach for coping with this problem is active…

Computation and Language · Computer Science 2023-10-18 Yotam Perlitz , Ariel Gera , Michal Shmueli-Scheuer , Dafna Sheinwald , Noam Slonim , Liat Ein-Dor

Inconsistency-Based Data-Centric Active Open-Set Annotation

Active learning is a commonly used approach that reduces the labeling effort required to train deep neural networks. However, the effectiveness of current active learning methods is limited by their closed-world assumptions, which assume…

Machine Learning · Computer Science 2024-01-11 Ruiyu Mao , Ouyang Xu , Yunhui Guo

Efficient Process Reward Model Training via Active Learning

Process Reward Models (PRMs) provide step-level supervision to large language models (LLMs), but scaling up training data annotation remains challenging for both humans and LLMs. To address this limitation, we propose an active learning…

Machine Learning · Computer Science 2025-04-16 Keyu Duan , Zichen Liu , Xin Mao , Tianyu Pang , Changyu Chen , Qiguang Chen , Michael Qizhe Shieh , Longxu Dou

Consistency-based Semi-supervised Active Learning: Towards Minimizing Labeling Cost

Active learning (AL) combines data labeling and model training to minimize the labeling cost by prioritizing the selection of high value data that can best improve model performance. In pool-based active learning, accessible unlabeled data…

Machine Learning · Computer Science 2020-07-21 Mingfei Gao , Zizhao Zhang , Guo Yu , Sercan O. Arik , Larry S. Davis , Tomas Pfister