English
Related papers

Related papers: Learning from Imperfect Annotations

200 papers

Building an accurate computer-aided diagnosis system based on data-driven approaches requires a large amount of high-quality labeled data. In medical imaging analysis, multiple expert annotators often produce subjective estimates about…

Computer Vision and Pattern Recognition · Computer Science 2022-03-22 Khiem H. Le , Tuan V. Tran , Hieu H. Pham , Hieu T. Nguyen , Tung T. Le , Ha Q. Nguyen

Researchers have raised awareness about the harms of aggregating labels especially in subjective tasks that naturally contain disagreements among human annotators. In this work we show that models that are only provided aggregated labels…

Computation and Language · Computer Science 2024-03-08 Abhishek Anand , Negar Mokhberian , Prathyusha Naresh Kumar , Anweasha Saha , Zihao He , Ashwin Rao , Fred Morstatter , Kristina Lerman

Human annotations are vital to supervised learning, yet annotators often disagree on the correct label, especially as annotation tasks increase in complexity. A strategy to improve label quality is to ask multiple annotators to label the…

Machine Learning · Computer Science 2023-12-22 Alexander Braylan , Madalyn Marabella , Omar Alonso , Matthew Lease

Traditional supervised learning requires ground truth labels for the training data, whose collection can be difficult in many cases. Recently, crowdsourcing has established itself as an efficient labeling solution through resorting to…

Machine Learning · Computer Science 2021-07-13 Ye Shi , Shao-Yuan Li , Sheng-Jun Huang

Selecting an effective training signal for machine learning tasks is difficult: expert annotations are expensive, and crowd-sourced annotations may not be reliable. Recent work has demonstrated that learning from a distribution over labels…

Computation and Language · Computer Science 2025-04-23 Dustin Wright , Isabelle Augenstein

Data is the engine of modern computer vision, which necessitates collecting large-scale datasets. This is expensive, and guaranteeing the quality of the labels is a major challenge. In this paper, we investigate efficient annotation…

Computer Vision and Pattern Recognition · Computer Science 2021-04-27 Yuan-Hong Liao , Amlan Kar , Sanja Fidler

Uncertainty in machine learning models is a timely and vast field of research. In supervised learning, uncertainty can already occur in the first stage of the training process, the annotation phase. This scenario is particularly evident…

Machine Learning · Computer Science 2024-07-24 Katharina Hechinger , Christoph Koller , Xiao Xiang Zhu , Göran Kauermann

Supervised machine learning assumes that labeled data provide accurate measurements of the concepts models are meant to learn. Yet in practice, human labeling introduces systematic variation arising from ambiguous items, divergent…

Methodology · Statistics 2026-04-10 Robert Chew , Stephanie Eckman , Christoph Kern , Frauke Kreuter

Cognitive computing systems require human labeled data for evaluation, and often for training. The standard practice used in gathering this data minimizes disagreement between annotators, and we have found this results in data that fails to…

Computation and Language · Computer Science 2018-09-27 Anca Dumitrache , Lora Aroyo , Chris Welty

Humans can be notoriously imperfect evaluators. They are often biased, unreliable, and unfit to define "ground truth." Yet, given the surging need to produce large amounts of training data in educational applications using AI, traditional…

Artificial Intelligence · Computer Science 2025-08-04 Danielle R. Thomas , Conrad Borchers , Kenneth R. Koedinger

A popular approach for large scale data annotation tasks is crowdsourcing, wherein each data point is labeled by multiple noisy annotators. We consider the problem of inferring ground truth from noisy ordinal labels obtained from multiple…

Machine Learning · Statistics 2013-05-02 Balaji Lakshminarayanan , Yee Whye Teh

Annotated data is an essential ingredient in natural language processing for training and evaluating machine learning models. It is therefore very desirable for the annotations to be of high quality. Recent work, however, has shown that…

Computation and Language · Computer Science 2022-09-27 Jan-Christoph Klie , Bonnie Webber , Iryna Gurevych

Crowdsourcing provides a practical way to obtain large amounts of labeled data at a low cost. However, the annotation quality of annotators varies considerably, which imposes new challenges in learning a high-quality model from the…

Machine Learning · Computer Science 2021-06-15 Zhendong Chu , Jing Ma , Hongning Wang

Crowd-sourcing is a cheap and popular means of creating training and evaluation datasets for machine learning, however it poses the problem of `truth inference', as individual workers cannot be wholly trusted to provide reliable…

Machine Learning · Computer Science 2019-02-26 Yuan Li , Benjamin I. P. Rubinstein , Trevor Cohn

High-quality data is necessary for modern machine learning. However, the acquisition of such data is difficult due to noisy and ambiguous annotations of humans. The aggregation of such annotations to determine the label of an image leads to…

Computer Vision and Pattern Recognition · Computer Science 2022-11-07 Lars Schmarje , Vasco Grossmann , Claudius Zelenka , Sabine Dippel , Rainer Kiko , Mariusz Oszust , Matti Pastell , Jenny Stracke , Anna Valros , Nina Volkmann , Reinhard Koch

Training NLP systems typically assumes access to annotated data that has a single human label per example. Given imperfect labeling from annotators and inherent ambiguity of language, we hypothesize that single label is not sufficient to…

Computation and Language · Computer Science 2021-09-14 Shujian Zhang , Chengyue Gong , Eunsol Choi

High-quality data annotation is an essential but laborious and costly aspect of developing machine learning-based software. We explore the inherent tradeoff between annotation accuracy and cost by detecting and removing minority reports --…

Machine Learning · Computer Science 2025-04-15 Hsuan Wei Liao , Christopher Klugmann , Daniel Kondermann , Rafid Mahmood

Crowdsourcing platforms are often used to collect datasets for training machine learning models, despite higher levels of inaccurate labeling compared to expert labeling. There are two common strategies to manage the impact of such noise.…

Computation and Language · Computer Science 2022-06-14 Derek Chen , Zhou Yu , Samuel R. Bowman

Crowdsourcing is a popular approach to collect annotations for unlabeled data instances. It involves collecting a large number of annotations from several, often naive untrained annotators for each data instance which are then combined to…

Machine Learning · Computer Science 2020-05-08 Anil Ramakrishna , Rahul Gupta , Shrikanth Narayanan

Sequence labeling is a fundamental framework for various natural language processing problems. Its performance is largely influenced by the annotation quality and quantity in supervised learning scenarios, and obtaining ground truth labels…

Computation and Language · Computer Science 2020-04-17 Ouyu Lan , Xiao Huang , Bill Yuchen Lin , He Jiang , Liyuan Liu , Xiang Ren
‹ Prev 1 2 3 10 Next ›