Related papers: Learning From Noisy Singly-labeled Data

Label Selection Approach to Learning from Crowds

Supervised learning, especially supervised deep learning, requires large amounts of labeled data. One approach to collect large amounts of labeled data is by using a crowdsourcing platform where numerous workers perform the annotation…

Machine Learning · Computer Science 2023-08-22 Kosuke Yoshimura , Hisashi Kashima

Meta-learning Representations for Learning from Multiple Annotators

We propose a meta-learning method for learning from multiple noisy annotators. In many applications such as crowdsourcing services, labels for supervised learning are given by multiple annotators. Since the annotators have different skills…

Machine Learning · Computer Science 2025-06-13 Atsutoshi Kumagai , Tomoharu Iwata , Taishi Nishiyama , Yasutoshi Ida , Yasuhiro Fujiwara

Learning From Noisy Labels By Regularized Estimation Of Annotator Confusion

The predictive performance of supervised learning algorithms depends on the quality of labels. In a typical label collection process, multiple annotators provide subjective noisy estimates of the "truth" under the influence of their varying…

Machine Learning · Computer Science 2019-06-18 Ryutaro Tanno , Ardavan Saeedi , Swami Sankaranarayanan , Daniel C. Alexander , Nathan Silberman

Exploiting Class Learnability in Noisy Data

In many domains, collecting sufficient labeled training data for supervised machine learning requires easily accessible but noisy sources, such as crowdsourcing services or tagged Web data. Noisy labels occur frequently in data sets…

Machine Learning · Computer Science 2018-11-16 Matthew Klawonn , Eric Heim , James Hendler

CrowdTeacher: Robust Co-teaching with Noisy Answers & Sample-specific Perturbations for Tabular Data

Samples with ground truth labels may not always be available in numerous domains. While learning from crowdsourcing labels has been explored, existing models can still fail in the presence of sparse, unreliable, or diverging annotations.…

Machine Learning · Computer Science 2021-12-07 Mani Sotoodeh , Li Xiong , Joyce C. Ho

Clean or Annotate: How to Spend a Limited Data Collection Budget

Crowdsourcing platforms are often used to collect datasets for training machine learning models, despite higher levels of inaccurate labeling compared to expert labeling. There are two common strategies to manage the impact of such noise.…

Computation and Language · Computer Science 2022-06-14 Derek Chen , Zhou Yu , Samuel R. Bowman

Learning from Multiple Annotator Noisy Labels via Sample-wise Label Fusion

Data lies at the core of modern deep learning. The impressive performance of supervised learning is built upon a base of massive accurately labeled data. However, in some real-world applications, accurate labeling might not be viable;…

Machine Learning · Computer Science 2022-07-26 Zhengqi Gao , Fan-Keng Sun , Mingran Yang , Sucheng Ren , Zikai Xiong , Marc Engeler , Antonio Burazer , Linda Wildling , Luca Daniel , Duane S. Boning

Learning From Crowdsourced Noisy Labels: A Signal Processing Perspective

One of the primary catalysts fueling advances in artificial intelligence (AI) and machine learning (ML) is the availability of massive, curated datasets. A commonly used technique to curate such massive datasets is crowdsourcing, where data…

Signal Processing · Electrical Eng. & Systems 2025-07-04 Shahana Ibrahim , Panagiotis A. Traganitis , Xiao Fu , Georgios B. Giannakis

Improving a Named Entity Recognizer Trained on Noisy Data with a Few Clean Instances

To achieve state-of-the-art performance, one still needs to train NER models on large-scale, high-quality annotated data, an asset that is both costly and time-intensive to accumulate. In contrast, real-world applications often resort to…

Computation and Language · Computer Science 2023-10-26 Zhendong Chu , Ruiyi Zhang , Tong Yu , Rajiv Jain , Vlad I Morariu , Jiuxiang Gu , Ani Nenkova

Learning from Crowds by Modeling Common Confusions

Crowdsourcing provides a practical way to obtain large amounts of labeled data at a low cost. However, the annotation quality of annotators varies considerably, which imposes new challenges in learning a high-quality model from the…

Machine Learning · Computer Science 2021-06-15 Zhendong Chu , Jing Ma , Hongning Wang

End-to-End Learning from Noisy Crowd to Supervised Machine Learning Models

Labeling real-world datasets is time consuming but indispensable for supervised machine learning models. A common solution is to distribute the labeling task across a large number of non-expert workers via crowd-sourcing. Due to the varying…

Machine Learning · Computer Science 2020-11-16 Taraneh Younesian , Chi Hong , Amirmasoud Ghiassi , Robert Birke , Lydia Y. Chen

Learning with Noisy Labels Revisited: A Study Using Real-World Human Annotations

Existing research on learning with noisy labels mainly focuses on synthetic label noise. Synthetic noise, though has clean structures which greatly enabled statistical analyses, often fails to model real-world noise patterns. The recent…

Machine Learning · Computer Science 2022-03-29 Jiaheng Wei , Zhaowei Zhu , Hao Cheng , Tongliang Liu , Gang Niu , Yang Liu

Topic Model Based Multi-Label Classification from the Crowd

Multi-label classification is a common supervised machine learning problem where each instance is associated with multiple classes. The key challenge in this problem is learning the correlations between the classes. An additional challenge…

Machine Learning · Computer Science 2016-04-05 Divya Padmanabhan , Satyanath Bhat , Shirish Shevade , Y. Narahari

Rethinking Crowdsourcing Annotation: Partial Annotation with Salient Labels for Multi-Label Image Classification

Annotated images are required for both supervised model training and evaluation in image classification. Manually annotating images is arduous and expensive, especially for multi-labeled images. A recent trend for conducting such laboursome…

Computer Vision and Pattern Recognition · Computer Science 2022-12-07 Jianzhe Lin , Tianze Yu , Z. Jane Wang

To Aggregate or Not? Learning with Separate Noisy Labels

The rawly collected training data often comes with separate noisy labels collected from multiple imperfect annotators (e.g., via crowdsourcing). A typical way of using these separate labels is to first aggregate them into one and apply…

Machine Learning · Computer Science 2022-10-21 Jiaheng Wei , Zhaowei Zhu , Tianyi Luo , Ehsan Amid , Abhishek Kumar , Yang Liu

Doubly Robust Crowdsourcing

Large-scale labeled dataset is the indispensable fuel that ignites the AI revolution as we see today. Most such datasets are constructed using crowdsourcing services such as Amazon Mechanical Turk which provides noisy labels from…

Human-Computer Interaction · Computer Science 2022-03-15 Chong Liu , Yu-Xiang Wang

Robust Assignment of Labels for Active Learning with Sparse and Noisy Annotations

Supervised classification algorithms are used to solve a growing number of real-life problems around the globe. Their performance is strictly connected with the quality of labels used in training. Unfortunately, acquiring good-quality…

Machine Learning · Computer Science 2024-07-08 Daniel Kałuża , Andrzej Janusz , Dominik Ślęzak

Learning to Learn from Noisy Labeled Data

Despite the success of deep neural networks (DNNs) in image classification tasks, the human-level performance relies on massive training data with high-quality manual annotations, which are expensive and time-consuming to collect. There…

Machine Learning · Computer Science 2019-04-15 Junnan Li , Yongkang Wong , Qi Zhao , Mohan Kankanhalli

Learning with Neighbor Consistency for Noisy Labels

Recent advances in deep learning have relied on large, labelled datasets to train high-capacity models. However, collecting large datasets in a time- and cost-efficient manner often results in label noise. We present a method for learning…

Computer Vision and Pattern Recognition · Computer Science 2022-07-07 Ahmet Iscen , Jack Valmadre , Anurag Arnab , Cordelia Schmid

Training a Neural Network in a Low-Resource Setting on Automatically Annotated Noisy Data

Manually labeled corpora are expensive to create and often not available for low-resource languages or domains. Automatic labeling approaches are an alternative way to obtain labeled data in a quicker and cheaper way. However, these labels…

Machine Learning · Computer Science 2018-07-24 Michael A. Hedderich , Dietrich Klakow