Related papers: Learning to Contextually Aggregate Multi-Source Su…

Learning from Crowds by Modeling Common Confusions

Crowdsourcing provides a practical way to obtain large amounts of labeled data at a low cost. However, the annotation quality of annotators varies considerably, which imposes new challenges in learning a high-quality model from the…

Machine Learning · Computer Science 2021-06-15 Zhendong Chu , Jing Ma , Hongning Wang

Truth Discovery in Sequence Labels from Crowds

Annotation quality and quantity positively affect the learning performance of sequence labeling, a vital task in Natural Language Processing. Hiring domain experts to annotate a corpus is very costly in terms of money and time.…

Human-Computer Interaction · Computer Science 2023-07-04 Nasim Sabetpour , Adithya Kulkarni , Sihong Xie , Qi Li

Learning from Imperfect Annotations

Many machine learning systems today are trained on large amounts of human-annotated data. Data annotation tasks that require a high level of competency make data acquisition expensive, while the resulting labels are often subjective,…

Machine Learning · Computer Science 2020-04-08 Emmanouil Antonios Platanios , Maruan Al-Shedivat , Eric Xing , Tom Mitchell

Aggregating Soft Labels from Crowd Annotations Improves Uncertainty Estimation Under Distribution Shift

Selecting an effective training signal for machine learning tasks is difficult: expert annotations are expensive, and crowd-sourced annotations may not be reliable. Recent work has demonstrated that learning from a distribution over labels…

Computation and Language · Computer Science 2025-04-23 Dustin Wright , Isabelle Augenstein

Learning from Crowds with Sparse and Imbalanced Annotations

Traditional supervised learning requires ground truth labels for the training data, whose collection can be difficult in many cases. Recently, crowdsourcing has established itself as an efficient labeling solution through resorting to…

Machine Learning · Computer Science 2021-07-13 Ye Shi , Shao-Yuan Li , Sheng-Jun Huang

Multi-Perspective Context Aggregation for Semi-supervised Cloze-style Reading Comprehension

Cloze-style reading comprehension has been a popular task for measuring the progress of natural language understanding in recent years. In this paper, we design a novel multi-perspective framework, which can be seen as the joint training of…

Computation and Language · Computer Science 2018-08-21 Liang Wang , Sujian Li , Wei Zhao , Kewei Shen , Meng Sun , Ruoyu Jia , Jingming Liu

Crowdsourcing Learning as Domain Adaptation: A Case Study on Named Entity Recognition

Crowdsourcing is regarded as one prospective solution for effective supervised learning, aiming to build large-scale annotated training data by crowd workers. Previous studies focus on reducing the influences from the noises of the…

Computation and Language · Computer Science 2021-11-16 Xin Zhang , Guangwei Xu , Yueheng Sun , Meishan Zhang , Pengjun Xie

Label Selection Approach to Learning from Crowds

Supervised learning, especially supervised deep learning, requires large amounts of labeled data. One approach to collect large amounts of labeled data is by using a crowdsourcing platform where numerous workers perform the annotation…

Machine Learning · Computer Science 2023-08-22 Kosuke Yoshimura , Hisashi Kashima

Adapt, Agree, Aggregate: Semi-Supervised Ensemble Labeling for Graph Convolutional Networks

In this paper, we propose a novel framework that combines ensemble learning with augmented graph structures to improve the performance and robustness of semi-supervised node classification in graphs. By creating multiple augmented views of…

Machine Learning · Computer Science 2025-03-25 Maryam Abdolali , Romina Zakerian , Behnam Roshanfekr , Fardin Ayar , Mohammad Rahmati

Learning Ambiguity from Crowd Sequential Annotations

Most crowdsourcing learning methods treat disagreement between annotators as noisy labelings while inter-disagreement among experts is often a good indicator for the ambiguity and uncertainty that is inherent in natural language. In this…

Computation and Language · Computer Science 2023-01-05 Xiaolei Lu

From Ground Truth to Measurement: A Statistical Framework for Human Labeling

Supervised machine learning assumes that labeled data provide accurate measurements of the concepts models are meant to learn. Yet in practice, human labeling introduces systematic variation arising from ambiguous items, divergent…

Methodology · Statistics 2026-04-10 Robert Chew , Stephanie Eckman , Christoph Kern , Frauke Kreuter

Coupled Confusion Correction: Learning from Crowds with Sparse Annotations

As the size of the datasets getting larger, accurately annotating such datasets is becoming more impractical due to the expensiveness on both time and economy. Therefore, crowd-sourcing has been widely adopted to alleviate the cost of…

Machine Learning · Computer Science 2024-02-21 Hansong Zhang , Shikun Li , Dan Zeng , Chenggang Yan , Shiming Ge

Modeling sequential annotations for sequence labeling with crowds

Crowd sequential annotations can be an efficient and cost-effective way to build large datasets for sequence labeling. Different from tagging independent instances, for crowd sequential annotations the quality of label sequence relies on…

Computation and Language · Computer Science 2022-09-21 Xiaolei Lu , Tommy W. S. Chow

Meta-learning Representations for Learning from Multiple Annotators

We propose a meta-learning method for learning from multiple noisy annotators. In many applications such as crowdsourcing services, labels for supervised learning are given by multiple annotators. Since the annotators have different skills…

Machine Learning · Computer Science 2025-06-13 Atsutoshi Kumagai , Tomoharu Iwata , Taishi Nishiyama , Yasutoshi Ida , Yasuhiro Fujiwara

Don't Blame the Data, Blame the Model: Understanding Noise and Bias When Learning from Subjective Annotations

Researchers have raised awareness about the harms of aggregating labels especially in subjective tasks that naturally contain disagreements among human annotators. In this work we show that models that are only provided aggregated labels…

Computation and Language · Computer Science 2024-03-08 Abhishek Anand , Negar Mokhberian , Prathyusha Naresh Kumar , Anweasha Saha , Zihao He , Ashwin Rao , Fred Morstatter , Kristina Lerman

Multi-Label Annotation Aggregation in Crowdsourcing

As a means of human-based computation, crowdsourcing has been widely used to annotate large-scale unlabeled datasets. One of the obvious challenges is how to aggregate these possibly noisy labels provided by a set of heterogeneous…

Machine Learning · Computer Science 2020-10-20 Xuan Wei , Daniel Dajun Zeng , Junming Yin

A General Model for Aggregating Annotations Across Simple, Complex, and Multi-Object Annotation Tasks

Human annotations are vital to supervised learning, yet annotators often disagree on the correct label, especially as annotation tasks increase in complexity. A strategy to improve label quality is to ask multiple annotators to label the…

Machine Learning · Computer Science 2023-12-22 Alexander Braylan , Madalyn Marabella , Omar Alonso , Matthew Lease

Towards Long-term Annotators: A Supervised Label Aggregation Baseline

Relying on crowdsourced workers, data crowdsourcing platforms are able to efficiently provide vast amounts of labeled data. Due to the variability in the annotation quality of crowd workers, modern techniques resort to redundant annotations…

Human-Computer Interaction · Computer Science 2023-11-28 Haoyu Liu , Fei Wang , Minmin Lin , Runze Wu , Renyu Zhu , Shiwei Zhao , Kai Wang , Tangjie Lv , Changjie Fan

CrowdTeacher: Robust Co-teaching with Noisy Answers & Sample-specific Perturbations for Tabular Data

Samples with ground truth labels may not always be available in numerous domains. While learning from crowdsourcing labels has been explored, existing models can still fail in the presence of sparse, unreliable, or diverging annotations.…

Machine Learning · Computer Science 2021-12-07 Mani Sotoodeh , Li Xiong , Joyce C. Ho

Aggregating From Multiple Target-Shifted Sources

Multi-source domain adaptation aims at leveraging the knowledge from multiple tasks for predicting a related target domain. Hence, a crucial aspect is to properly combine different sources based on their relations. In this paper, we…

Machine Learning · Computer Science 2021-06-16 Changjian Shui , Zijian Li , Jiaqi Li , Christian Gagné , Charles Ling , Boyu Wang