Related papers: Efficient Online Crowdsourcing with Complex Annota…

Multi-Label Annotation Aggregation in Crowdsourcing

As a means of human-based computation, crowdsourcing has been widely used to annotate large-scale unlabeled datasets. One of the obvious challenges is how to aggregate these possibly noisy labels provided by a set of heterogeneous…

Machine Learning · Computer Science 2020-10-20 Xuan Wei , Daniel Dajun Zeng , Junming Yin

Attention-Aware Answers of the Crowd

Crowdsourcing is a relatively economic and efficient solution to collect annotations from the crowd through online platforms. Answers collected from workers with different expertise may be noisy and unreliable, and the quality of annotated…

Machine Learning · Computer Science 2020-01-08 Jingzheng Tu , Guoxian Yu , Jun Wang , Carlotta Domeniconi , Xiangliang Zhang

CROWDLAB: Supervised learning to infer consensus labels and quality scores for data with multiple annotators

Real-world data for classification is often labeled by multiple annotators. For analyzing such data, we introduce CROWDLAB, a straightforward approach to utilize any trained classifier to estimate: (1) A consensus label for each example…

Machine Learning · Computer Science 2023-01-30 Hui Wen Goh , Ulyana Tkachenko , Jonas Mueller

An Active Learning Approach for Jointly Estimating Worker Performance and Annotation Reliability with Crowdsourced Data

Crowdsourcing platforms offer a practical solution to the problem of affordably annotating large datasets for training supervised classifiers. Unfortunately, poor worker performance frequently threatens to compromise annotation reliability,…

Machine Learning · Computer Science 2014-01-17 Liyue Zhao , Yu Zhang , Gita Sukthankar

Toward Effective Automated Content Analysis via Crowdsourcing

Many computer scientists use the aggregated answers of online workers to represent ground truth. Prior work has shown that aggregation methods such as majority voting are effective for measuring relatively objective features. For subjective…

Computation and Language · Computer Science 2021-04-06 Jiele Wu , Chau-Wai Wong , Xinyan Zhao , Xianpeng Liu

Learning from Crowds by Modeling Common Confusions

Crowdsourcing provides a practical way to obtain large amounts of labeled data at a low cost. However, the annotation quality of annotators varies considerably, which imposes new challenges in learning a high-quality model from the…

Machine Learning · Computer Science 2021-06-15 Zhendong Chu , Jing Ma , Hongning Wang

Rethinking Crowdsourcing Annotation: Partial Annotation with Salient Labels for Multi-Label Image Classification

Annotated images are required for both supervised model training and evaluation in image classification. Manually annotating images is arduous and expensive, especially for multi-labeled images. A recent trend for conducting such laboursome…

Computer Vision and Pattern Recognition · Computer Science 2022-12-07 Jianzhe Lin , Tianze Yu , Z. Jane Wang

Towards Long-term Annotators: A Supervised Label Aggregation Baseline

Relying on crowdsourced workers, data crowdsourcing platforms are able to efficiently provide vast amounts of labeled data. Due to the variability in the annotation quality of crowd workers, modern techniques resort to redundant annotations…

Human-Computer Interaction · Computer Science 2023-11-28 Haoyu Liu , Fei Wang , Minmin Lin , Runze Wu , Renyu Zhu , Shiwei Zhao , Kai Wang , Tangjie Lv , Changjie Fan

Inferring ground truth from multi-annotator ordinal data: a probabilistic approach

A popular approach for large scale data annotation tasks is crowdsourcing, wherein each data point is labeled by multiple noisy annotators. We consider the problem of inferring ground truth from noisy ordinal labels obtained from multiple…

Machine Learning · Statistics 2013-05-02 Balaji Lakshminarayanan , Yee Whye Teh

Coupled Confusion Correction: Learning from Crowds with Sparse Annotations

As the size of the datasets getting larger, accurately annotating such datasets is becoming more impractical due to the expensiveness on both time and economy. Therefore, crowd-sourcing has been widely adopted to alleviate the cost of…

Machine Learning · Computer Science 2024-02-21 Hansong Zhang , Shikun Li , Dan Zeng , Chenggang Yan , Shiming Ge

A Comparative Study on Annotation Quality of Crowdsourcing and LLM via Label Aggregation

Whether Large Language Models (LLMs) can outperform crowdsourcing on the data annotation task is attracting interest recently. Some works verified this issue with the average performance of individual crowd workers and LLM workers on some…

Computation and Language · Computer Science 2024-01-19 Jiyi Li

Candidate Labeling for Crowd Learning

Crowdsourcing has become very popular among the machine learning community as a way to obtain labels that allow a ground truth to be estimated for a given dataset. In most of the approaches that use crowdsourced labels, annotators are asked…

Machine Learning · Statistics 2018-08-09 Iker Beñaran-Muñoz , Jerónimo Hernández-González , Aritz Pérez

Learning from Imperfect Annotations

Many machine learning systems today are trained on large amounts of human-annotated data. Data annotation tasks that require a high level of competency make data acquisition expensive, while the resulting labels are often subjective,…

Machine Learning · Computer Science 2020-04-08 Emmanouil Antonios Platanios , Maruan Al-Shedivat , Eric Xing , Tom Mitchell

A General Model for Aggregating Annotations Across Simple, Complex, and Multi-Object Annotation Tasks

Human annotations are vital to supervised learning, yet annotators often disagree on the correct label, especially as annotation tasks increase in complexity. A strategy to improve label quality is to ask multiple annotators to label the…

Machine Learning · Computer Science 2023-12-22 Alexander Braylan , Madalyn Marabella , Omar Alonso , Matthew Lease

Crowd Labeling: a survey

Recently, there has been a burst in the number of research projects on human computation via crowdsourcing. Multiple choice (or labeling) questions could be referred to as a common type of problem which is solved by this approach. As an…

Artificial Intelligence · Computer Science 2014-09-04 Jafar Muhammadi , Hamid Reza Rabiee , Abbas Hosseini

Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets

Data is the engine of modern computer vision, which necessitates collecting large-scale datasets. This is expensive, and guaranteeing the quality of the labels is a major challenge. In this paper, we investigate efficient annotation…

Computer Vision and Pattern Recognition · Computer Science 2021-04-27 Yuan-Hong Liao , Amlan Kar , Sanja Fidler

Achieving Budget-optimality with Adaptive Schemes in Crowdsourcing

Crowdsourcing platforms provide marketplaces where task requesters can pay to get labels on their data. Such markets have emerged recently as popular venues for collecting annotations that are crucial in training machine learning models in…

Machine Learning · Computer Science 2017-08-28 Ashish Khetan , Sewoong Oh

Crowdsourcing via Pairwise Co-occurrences: Identifiability and Algorithms

The data deluge comes with high demands for data labeling. Crowdsourcing (or, more generally, ensemble learning) techniques aim to produce accurate labels via integrating noisy, non-expert labeling from annotators. The classic Dawid-Skene…

Machine Learning · Computer Science 2019-09-30 Shahana Ibrahim , Xiao Fu , Nikos Kargas , Kejun Huang

The Challenge of Variable Effort Crowdsourcing and How Visible Gold Can Help

We consider a class of variable effort human annotation tasks in which the number of labels required per item can greatly vary (e.g., finding all faces in an image, named entities in a text, bird calls in an audio recording, etc.). In such…

Human-Computer Interaction · Computer Science 2021-11-16 Danula Hettiachchi , Mike Schaekermann , Tristan McKinney , Matthew Lease

Clean or Annotate: How to Spend a Limited Data Collection Budget

Crowdsourcing platforms are often used to collect datasets for training machine learning models, despite higher levels of inaccurate labeling compared to expert labeling. There are two common strategies to manage the impact of such noise.…

Computation and Language · Computer Science 2022-06-14 Derek Chen , Zhou Yu , Samuel R. Bowman