Related papers: Committee-Based Sample Selection for Probabilistic…

Minimizing Manual Annotation Cost In Supervised Training From Corpora

Corpus-based methods for natural language processing often use supervised training, requiring expensive manual annotation of training corpora. This paper investigates methods for reducing annotation cost by {\it sample selection}. In this…

cmp-lg · Computer Science 2008-02-03 Sean P. Engelson , Ido Dagan

Identifying Wrongly Predicted Samples: A Method for Active Learning

State-of-the-art machine learning models require access to significant amount of annotated data in order to achieve the desired level of performance. While unlabelled data can be largely available and even abundant, annotation process can…

Machine Learning · Computer Science 2020-10-15 Rahaf Aljundi , Nikolay Chumerin , Daniel Olmeda Reino

Learning from Stochastic Labels

Annotating multi-class instances is a crucial task in the field of machine learning. Unfortunately, identifying the correct class label from a long sequence of candidate labels is time-consuming and laborious. To alleviate this problem, we…

Machine Learning · Computer Science 2025-12-05 Meng Wei , Zhongnian Li , Yong Zhou , Qiaoyu Guo , Xinzheng Xu

Mask-guided sample selection for Semi-Supervised Instance Segmentation

Image segmentation methods are usually trained with pixel-level annotations, which require significant human effort to collect. The most common solution to address this constraint is to implement weakly-supervised pipelines trained with…

Computer Vision and Pattern Recognition · Computer Science 2020-08-26 Miriam Bellver , Amaia Salvador , Jordi Torres , Xavier Giro-i-Nieto

Diverse Subset Selection via Norm-Based Sampling and Orthogonality

Large annotated datasets are crucial for the success of deep neural networks, but labeling data can be prohibitively expensive in domains such as medical imaging. This work tackles the subset selection problem: selecting a small set of the…

Machine Learning · Computer Science 2025-09-29 Noga Bar , Raja Giryes

Selective Annotation Makes Language Models Better Few-Shot Learners

Many recent approaches to natural language tasks are built on the remarkable abilities of large language models. Large language models can perform in-context learning, where they learn a new task from a few task demonstrations, without any…

Computation and Language · Computer Science 2022-09-07 Hongjin Su , Jungo Kasai , Chen Henry Wu , Weijia Shi , Tianlu Wang , Jiayi Xin , Rui Zhang , Mari Ostendorf , Luke Zettlemoyer , Noah A. Smith , Tao Yu

Does Informativeness Matter? Active Learning for Educational Dialogue Act Classification

Dialogue Acts (DAs) can be used to explain what expert tutors do and what students know during the tutoring process. Most empirical studies adopt the random sampling method to obtain sentence samples for manual annotation of DAs, which are…

Computation and Language · Computer Science 2023-04-13 Wei Tan , Jionghao Lin , David Lang , Guanliang Chen , Dragan Gasevic , Lan Du , Wray Buntine

In-Context Learning on a Budget: A Case Study in Token Classification

Few shot in-context learning (ICL) typically assumes access to large annotated training sets. However, in many real world scenarios, such as domain adaptation, there is only a limited budget to annotate a small number of samples, with the…

Computation and Language · Computer Science 2025-01-29 Uri Berger , Tal Baumel , Gabriel Stanovsky

Enhanced Sample Selection with Confidence Tracking: Identifying Correctly Labeled yet Hard-to-Learn Samples in Noisy Data

We propose a novel sample selection method for image classification in the presence of noisy labels. Existing methods typically consider small-loss samples as correctly labeled. However, some correctly labeled samples are inherently…

Computer Vision and Pattern Recognition · Computer Science 2025-04-25 Weiran Pan , Wei Wei , Feida Zhu , Yong Deng

Active Scene Learning

Sketch recognition allows natural and efficient interaction in pen-based interfaces. A key obstacle to building accurate sketch recognizers has been the difficulty of creating large amounts of annotated training data. Several authors have…

Computer Vision and Pattern Recognition · Computer Science 2019-03-08 Erelcan Yanik , Tevfik Metin Sezgin

Prompt Selection Matters: Enhancing Text Annotations for Social Sciences with Large Language Models

Large Language Models have recently been applied to text annotation tasks from social sciences, equalling or surpassing the performance of human workers at a fraction of the cost. However, no inquiry has yet been made on the impact of…

Computation and Language · Computer Science 2025-03-11 Louis Abraham , Charles Arnal , Antoine Marie

Label-Efficient Model Selection for Text Generation

Model selection for a given target task can be costly, as it may entail extensive annotation of the quality of outputs of different models. We introduce DiffUse, an efficient method to make an informed decision between candidate text…

Computation and Language · Computer Science 2024-06-07 Shir Ashury-Tahan , Ariel Gera , Benjamin Sznajder , Leshem Choshen , Liat Ein-Dor , Eyal Shnarch

IDEAL: Influence-Driven Selective Annotations Empower In-Context Learners in Large Language Models

In-context learning is a promising paradigm that utilizes in-context examples as prompts for the predictions of large language models. These prompts are crucial for achieving strong performance. However, since the prompts need to be sampled…

Computation and Language · Computer Science 2025-07-15 Shaokun Zhang , Xiaobo Xia , Zhaoqing Wang , Ling-Hao Chen , Jiale Liu , Qingyun Wu , Tongliang Liu

Sample selection for efficient image annotation

Supervised object detection has been proven to be successful in many benchmark datasets achieving human-level performances. However, acquiring a large amount of labeled image samples for supervised detection training is tedious,…

Computer Vision and Pattern Recognition · Computer Science 2021-05-12 Bishwo Adhikari , Esa Rahtu , Heikki Huttunen

Aggregating Soft Labels from Crowd Annotations Improves Uncertainty Estimation Under Distribution Shift

Selecting an effective training signal for machine learning tasks is difficult: expert annotations are expensive, and crowd-sourced annotations may not be reliable. Recent work has demonstrated that learning from a distribution over labels…

Computation and Language · Computer Science 2025-04-23 Dustin Wright , Isabelle Augenstein

Modeling Multiple Annotator Expertise in the Semi-Supervised Learning Scenario

Learning algorithms normally assume that there is at most one annotation or label per data point. However, in some scenarios, such as medical diagnosis and on-line collaboration,multiple annotations may be available. In either case,…

Machine Learning · Computer Science 2012-03-19 Yan Yan , Romer Rosales , Glenn Fung , Jennifer Dy

Select, Label, Evaluate: Active Testing in NLP

Human annotation cost and time remain significant bottlenecks in Natural Language Processing (NLP), with test data annotation being particularly expensive due to the stringent requirement for low-error and high-quality labels necessary for…

Computation and Language · Computer Science 2026-03-24 Antonio Purificato , Maria Sofia Bucarelli , Andrea Bacciu , Amin Mantrach , Fabrizio Silvestri

Active Learning for NLP with Large Language Models

Human annotation of training samples is expensive, laborious, and sometimes challenging, especially for Natural Language Processing (NLP) tasks. To reduce the labeling cost and enhance the sample efficiency, Active Learning (AL) technique…

Computation and Language · Computer Science 2024-01-17 Xuesong Wang

Towards Model-Based Data Acquisition for Subjective Multi-Task NLP Problems

Data annotated by humans is a source of knowledge by describing the peculiarities of the problem and therefore fueling the decision process of the trained model. Unfortunately, the annotation process for subjective natural language…

Computation and Language · Computer Science 2023-12-14 Kamil Kanclerz , Julita Bielaniewicz , Marcin Gruza , Jan Kocon , Stanisław Woźniak , Przemysław Kazienko

Auto-Annotation Quality Prediction for Semi-Supervised Learning with Ensembles

Auto-annotation by ensemble of models is an efficient method of learning on unlabeled data. Wrong or inaccurate annotations generated by the ensemble may lead to performance degradation of the trained model. To deal with this problem we…

Computer Vision and Pattern Recognition · Computer Science 2024-03-14 Dror Simon , Miriam Farber , Roman Goldenberg