Related papers: Visualizing NLP annotations for Crowdsourcing

Crowdsourcing in Computer Vision

Computer vision systems require large amounts of manually annotated data to properly learn challenging visual concepts. Crowdsourcing platforms offer an inexpensive method to capture human knowledge and understanding, for a vast number of…

Computer Vision and Pattern Recognition · Computer Science 2016-11-08 Adriana Kovashka , Olga Russakovsky , Li Fei-Fei , Kristen Grauman

Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets

Crowdsourcing has been the prevalent paradigm for creating natural language understanding datasets in recent years. A common crowdsourcing practice is to recruit a small number of high-quality workers, and have them massively generate…

Computation and Language · Computer Science 2019-08-29 Mor Geva , Yoav Goldberg , Jonathan Berant

Lessons Learned from a Citizen Science Project for Natural Language Processing

Many Natural Language Processing (NLP) systems use annotated corpora for training and evaluation. However, labeled data is often costly to obtain and scaling annotation projects is difficult, which is why annotation tasks are often…

Computation and Language · Computer Science 2023-04-26 Jan-Christoph Klie , Ji-Ung Lee , Kevin Stowe , Gözde Gül Şahin , Nafise Sadat Moosavi , Luke Bates , Dominic Petrak , Richard Eckart de Castilho , Iryna Gurevych

CROWDLAB: Supervised learning to infer consensus labels and quality scores for data with multiple annotators

Real-world data for classification is often labeled by multiple annotators. For analyzing such data, we introduce CROWDLAB, a straightforward approach to utilize any trained classifier to estimate: (1) A consensus label for each example…

Machine Learning · Computer Science 2023-01-30 Hui Wen Goh , Ulyana Tkachenko , Jonas Mueller

Learning from Crowds by Modeling Common Confusions

Crowdsourcing provides a practical way to obtain large amounts of labeled data at a low cost. However, the annotation quality of annotators varies considerably, which imposes new challenges in learning a high-quality model from the…

Machine Learning · Computer Science 2021-06-15 Zhendong Chu , Jing Ma , Hongning Wang

Efficiently Crowdsourcing Visual Importance with Punch-Hole Annotation

We introduce a novel crowdsourcing method for identifying important areas in graphical images through punch-hole labeling. Traditional methods, such as gaze trackers and mouse-based annotations, which generate continuous data, can be…

Human-Computer Interaction · Computer Science 2024-09-17 Minsuk Chang , Soohyun Lee , Aeri Cho , Hyeon Jeon , Seokhyeon Park , Cindy Xiong Bearfield , Jinwook Seo

Crowdsourcing of Real-world Image Annotation via Visual Properties

Recent advances in data-centric artificial intelligence highlight inherent limitations in object recognition datasets. One of the primary issues stems from the semantic gap problem, which results in complex many-to-many mappings between…

Computer Vision and Pattern Recognition · Computer Science 2026-04-17 Xiaolei Diao , Fausto Giunchiglia

CrowdAgent: Multi-Agent Managed Multi-Source Annotation System

High-quality annotated data is a cornerstone of modern Natural Language Processing (NLP). While recent methods begin to leverage diverse annotation sources-including Large Language Models (LLMs), Small Language Models (SLMs), and human…

Artificial Intelligence · Computer Science 2025-09-18 Maosheng Qin , Renyu Zhu , Mingxuan Xia , Chenkai Chen , Zhen Zhu , Minmin Lin , Junbo Zhao , Lu Xu , Changjie Fan , Runze Wu , Haobo Wang

No Need to Sacrifice Data Quality for Quantity: Crowd-Informed Machine Annotation for Cost-Effective Understanding of Visual Data

Labeling visual data is expensive and time-consuming. Crowdsourcing systems promise to enable highly parallelizable annotations through the participation of monetarily or otherwise motivated workers, but even this approach has its limits.…

Human-Computer Interaction · Computer Science 2024-09-04 Christopher Klugmann , Rafid Mahmood , Guruprasad Hegde , Amit Kale , Daniel Kondermann

Beyond Majority Voting: Agreement-Based Clustering to Model Annotator Perspectives in Subjective NLP Tasks

Disagreement in annotation is a common phenomenon in the development of NLP datasets and serves as a valuable source of insight. While majority voting remains the dominant strategy for aggregating labels, recent work has explored modeling…

Computation and Language · Computer Science 2026-05-12 Tadesse Destaw Belay , Ibrahim Said Ahmad , Idris Abdulmumin , Abinew Ali Ayele , Alexander Gelbukh , Eusebio Ricárdez-Vázquez , Olga Kolesnikova , Shamsuddeen Hassan Muhammad , Seid Muhie Yimam

A Comparative Study on Annotation Quality of Crowdsourcing and LLM via Label Aggregation

Whether Large Language Models (LLMs) can outperform crowdsourcing on the data annotation task is attracting interest recently. Some works verified this issue with the average performance of individual crowd workers and LLM workers on some…

Computation and Language · Computer Science 2024-01-19 Jiyi Li

Learning From Crowdsourced Noisy Labels: A Signal Processing Perspective

One of the primary catalysts fueling advances in artificial intelligence (AI) and machine learning (ML) is the availability of massive, curated datasets. A commonly used technique to curate such massive datasets is crowdsourcing, where data…

Signal Processing · Electrical Eng. & Systems 2025-07-04 Shahana Ibrahim , Panagiotis A. Traganitis , Xiao Fu , Georgios B. Giannakis

The Challenge of Variable Effort Crowdsourcing and How Visible Gold Can Help

We consider a class of variable effort human annotation tasks in which the number of labels required per item can greatly vary (e.g., finding all faces in an image, named entities in a text, bird calls in an audio recording, etc.). In such…

Human-Computer Interaction · Computer Science 2021-11-16 Danula Hettiachchi , Mike Schaekermann , Tristan McKinney , Matthew Lease

On Releasing Annotator-Level Labels and Information in Datasets

A common practice in building NLP datasets, especially using crowd-sourced annotations, involves obtaining multiple annotator judgements on the same data instances, which are then flattened to produce a single "ground truth" label or score,…

Computation and Language · Computer Science 2021-10-13 Vinodkumar Prabhakaran , Aida Mostafazadeh Davani , Mark Díaz

Cross-Task Generalization via Natural Language Crowdsourcing Instructions

Humans (e.g., crowdworkers) have a remarkable ability in solving different tasks, by simply reading textual instructions that define them and looking at a few examples. Despite the success of the conventional supervised learning on…

Computation and Language · Computer Science 2022-03-15 Swaroop Mishra , Daniel Khashabi , Chitta Baral , Hannaneh Hajishirzi

Rethinking Crowdsourcing Annotation: Partial Annotation with Salient Labels for Multi-Label Image Classification

Annotated images are required for both supervised model training and evaluation in image classification. Manually annotating images is arduous and expensive, especially for multi-labeled images. A recent trend for conducting such laboursome…

Computer Vision and Pattern Recognition · Computer Science 2022-12-07 Jianzhe Lin , Tianze Yu , Z. Jane Wang

Toward Annotator Group Bias in Crowdsourcing

Crowdsourcing has emerged as a popular approach for collecting annotated data to train supervised machine learning models. However, annotator bias can lead to defective annotations. Though there are a few works investigating individual…

Human-Computer Interaction · Computer Science 2021-10-18 Haochen Liu , Joseph Thekinen , Sinem Mollaoglu , Da Tang , Ji Yang , Youlong Cheng , Hui Liu , Jiliang Tang

Coupled Confusion Correction: Learning from Crowds with Sparse Annotations

As the size of the datasets getting larger, accurately annotating such datasets is becoming more impractical due to the expensiveness on both time and economy. Therefore, crowd-sourcing has been widely adopted to alleviate the cost of…

Machine Learning · Computer Science 2024-02-21 Hansong Zhang , Shikun Li , Dan Zeng , Chenggang Yan , Shiming Ge

Clean or Annotate: How to Spend a Limited Data Collection Budget

Crowdsourcing platforms are often used to collect datasets for training machine learning models, despite higher levels of inaccurate labeling compared to expert labeling. There are two common strategies to manage the impact of such noise.…

Computation and Language · Computer Science 2022-06-14 Derek Chen , Zhou Yu , Samuel R. Bowman

Clustrophile: A Tool for Visual Clustering Analysis

While clustering is one of the most popular methods for data mining, analysts lack adequate tools for quick, iterative clustering analysis, which is essential for hypothesis generation and data reasoning. We introduce Clustrophile, an…

Human-Computer Interaction · Computer Science 2017-10-09 Çağatay Demiralp