English
Related papers

Related papers: Visualizing NLP annotations for Crowdsourcing

200 papers

Computer vision systems require large amounts of manually annotated data to properly learn challenging visual concepts. Crowdsourcing platforms offer an inexpensive method to capture human knowledge and understanding, for a vast number of…

Computer Vision and Pattern Recognition · Computer Science 2016-11-08 Adriana Kovashka , Olga Russakovsky , Li Fei-Fei , Kristen Grauman

Crowdsourcing has been the prevalent paradigm for creating natural language understanding datasets in recent years. A common crowdsourcing practice is to recruit a small number of high-quality workers, and have them massively generate…

Computation and Language · Computer Science 2019-08-29 Mor Geva , Yoav Goldberg , Jonathan Berant

Many Natural Language Processing (NLP) systems use annotated corpora for training and evaluation. However, labeled data is often costly to obtain and scaling annotation projects is difficult, which is why annotation tasks are often…

Real-world data for classification is often labeled by multiple annotators. For analyzing such data, we introduce CROWDLAB, a straightforward approach to utilize any trained classifier to estimate: (1) A consensus label for each example…

Machine Learning · Computer Science 2023-01-30 Hui Wen Goh , Ulyana Tkachenko , Jonas Mueller

Crowdsourcing provides a practical way to obtain large amounts of labeled data at a low cost. However, the annotation quality of annotators varies considerably, which imposes new challenges in learning a high-quality model from the…

Machine Learning · Computer Science 2021-06-15 Zhendong Chu , Jing Ma , Hongning Wang

We introduce a novel crowdsourcing method for identifying important areas in graphical images through punch-hole labeling. Traditional methods, such as gaze trackers and mouse-based annotations, which generate continuous data, can be…

Human-Computer Interaction · Computer Science 2024-09-17 Minsuk Chang , Soohyun Lee , Aeri Cho , Hyeon Jeon , Seokhyeon Park , Cindy Xiong Bearfield , Jinwook Seo

Recent advances in data-centric artificial intelligence highlight inherent limitations in object recognition datasets. One of the primary issues stems from the semantic gap problem, which results in complex many-to-many mappings between…

Computer Vision and Pattern Recognition · Computer Science 2026-04-17 Xiaolei Diao , Fausto Giunchiglia

High-quality annotated data is a cornerstone of modern Natural Language Processing (NLP). While recent methods begin to leverage diverse annotation sources-including Large Language Models (LLMs), Small Language Models (SLMs), and human…

Artificial Intelligence · Computer Science 2025-09-18 Maosheng Qin , Renyu Zhu , Mingxuan Xia , Chenkai Chen , Zhen Zhu , Minmin Lin , Junbo Zhao , Lu Xu , Changjie Fan , Runze Wu , Haobo Wang

Labeling visual data is expensive and time-consuming. Crowdsourcing systems promise to enable highly parallelizable annotations through the participation of monetarily or otherwise motivated workers, but even this approach has its limits.…

Human-Computer Interaction · Computer Science 2024-09-04 Christopher Klugmann , Rafid Mahmood , Guruprasad Hegde , Amit Kale , Daniel Kondermann

Disagreement in annotation is a common phenomenon in the development of NLP datasets and serves as a valuable source of insight. While majority voting remains the dominant strategy for aggregating labels, recent work has explored modeling…

Whether Large Language Models (LLMs) can outperform crowdsourcing on the data annotation task is attracting interest recently. Some works verified this issue with the average performance of individual crowd workers and LLM workers on some…

Computation and Language · Computer Science 2024-01-19 Jiyi Li

One of the primary catalysts fueling advances in artificial intelligence (AI) and machine learning (ML) is the availability of massive, curated datasets. A commonly used technique to curate such massive datasets is crowdsourcing, where data…

Signal Processing · Electrical Eng. & Systems 2025-07-04 Shahana Ibrahim , Panagiotis A. Traganitis , Xiao Fu , Georgios B. Giannakis

We consider a class of variable effort human annotation tasks in which the number of labels required per item can greatly vary (e.g., finding all faces in an image, named entities in a text, bird calls in an audio recording, etc.). In such…

Human-Computer Interaction · Computer Science 2021-11-16 Danula Hettiachchi , Mike Schaekermann , Tristan McKinney , Matthew Lease

A common practice in building NLP datasets, especially using crowd-sourced annotations, involves obtaining multiple annotator judgements on the same data instances, which are then flattened to produce a single "ground truth" label or score,…

Computation and Language · Computer Science 2021-10-13 Vinodkumar Prabhakaran , Aida Mostafazadeh Davani , Mark Díaz

Humans (e.g., crowdworkers) have a remarkable ability in solving different tasks, by simply reading textual instructions that define them and looking at a few examples. Despite the success of the conventional supervised learning on…

Computation and Language · Computer Science 2022-03-15 Swaroop Mishra , Daniel Khashabi , Chitta Baral , Hannaneh Hajishirzi

Annotated images are required for both supervised model training and evaluation in image classification. Manually annotating images is arduous and expensive, especially for multi-labeled images. A recent trend for conducting such laboursome…

Computer Vision and Pattern Recognition · Computer Science 2022-12-07 Jianzhe Lin , Tianze Yu , Z. Jane Wang

Crowdsourcing has emerged as a popular approach for collecting annotated data to train supervised machine learning models. However, annotator bias can lead to defective annotations. Though there are a few works investigating individual…

Human-Computer Interaction · Computer Science 2021-10-18 Haochen Liu , Joseph Thekinen , Sinem Mollaoglu , Da Tang , Ji Yang , Youlong Cheng , Hui Liu , Jiliang Tang

As the size of the datasets getting larger, accurately annotating such datasets is becoming more impractical due to the expensiveness on both time and economy. Therefore, crowd-sourcing has been widely adopted to alleviate the cost of…

Machine Learning · Computer Science 2024-02-21 Hansong Zhang , Shikun Li , Dan Zeng , Chenggang Yan , Shiming Ge

Crowdsourcing platforms are often used to collect datasets for training machine learning models, despite higher levels of inaccurate labeling compared to expert labeling. There are two common strategies to manage the impact of such noise.…

Computation and Language · Computer Science 2022-06-14 Derek Chen , Zhou Yu , Samuel R. Bowman

While clustering is one of the most popular methods for data mining, analysts lack adequate tools for quick, iterative clustering analysis, which is essential for hypothesis generation and data reasoning. We introduce Clustrophile, an…

Human-Computer Interaction · Computer Science 2017-10-09 Çağatay Demiralp
‹ Prev 1 2 3 10 Next ›