Related papers: Robust Logistic Regression using Shift Parameters …

Learning Deep Networks from Noisy Labels with Dropout Regularization

Large datasets often have unreliable labels-such as those obtained from Amazon's Mechanical Turk or social media platforms-and classifiers trained on mislabeled datasets often exhibit poor performance. We present a simple, effective…

Computer Vision and Pattern Recognition · Computer Science 2017-05-10 Ishan Jindal , Matthew Nokleby , Xuewen Chen

Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance

NLP benchmarks rely on standardized datasets for training and evaluating models and are crucial for advancing the field. Traditionally, expert annotations ensure high-quality labels; however, the cost of expert annotation does not scale…

Computation and Language · Computer Science 2025-09-15 Omer Nahum , Nitay Calderon , Orgad Keller , Idan Szpektor , Roi Reichart

Learning to Robustly Aggregate Labeling Functions for Semi-supervised Data Programming

A critical bottleneck in supervised machine learning is the need for large amounts of labeled data which is expensive and time consuming to obtain. However, it has been shown that a small amount of labeled data, while insufficient to…

Machine Learning · Computer Science 2022-03-11 Ayush Maheshwari , Krishnateja Killamsetty , Ganesh Ramakrishnan , Rishabh Iyer , Marina Danilevsky , Lucian Popa

Robust mislabel logistic regression without modeling mislabel probabilities

Logistic regression is among the most widely used statistical methods for linear discriminant analysis. In many applications, we only observe possibly mislabeled responses. Fitting a conventional logistic regression can then lead to biased…

Applications · Statistics 2017-02-21 Hung Hung , Zhi-Yu Jou , Su-Yun Huang

Learning from Training Dynamics: Identifying Mislabeled Data Beyond Manually Designed Features

While mislabeled or ambiguously-labeled samples in the training set could negatively affect the performance of deep models, diagnosing the dataset and identifying mislabeled samples helps to improve the generalization power. Training…

Computer Vision and Pattern Recognition · Computer Science 2022-12-21 Qingrui Jia , Xuhong Li , Lei Yu , Jiang Bian , Penghao Zhao , Shupeng Li , Haoyi Xiong , Dejing Dou

Robust Feature Learning Against Noisy Labels

Supervised learning of deep neural networks heavily relies on large-scale datasets annotated by high-quality labels. In contrast, mislabeled samples can significantly degrade the generalization of models and result in memorizing samples,…

Computer Vision and Pattern Recognition · Computer Science 2023-07-11 Tsung-Ming Tai , Yun-Jie Jhang , Wen-Jyi Hwang

"You are an expert annotator": Automatic Best-Worst-Scaling Annotations for Emotion Intensity Modeling

Labeling corpora constitutes a bottleneck to create models for new tasks or domains. Large language models mitigate the issue with automatic corpus labeling methods, particularly for categorical annotations. Some NLP tasks such as emotion…

Computation and Language · Computer Science 2024-04-23 Christopher Bagdon , Prathamesh Karmalker , Harsha Gurulingappa , Roman Klinger

Analyze the Robustness of Classifiers under Label Noise

This study explores the robustness of label noise classifiers, aiming to enhance model resilience against noisy data in complex real-world scenarios. Label noise in supervised learning, characterized by erroneous or imprecise labels,…

Machine Learning · Computer Science 2023-12-13 Cheng Zeng , Yixuan Xu , Jiaqi Tian

Noise Correction on Subjective Datasets

Incorporating every annotator's perspective is crucial for unbiased data modeling. Annotator fatigue and changing opinions over time can distort dataset annotations. To combat this, we propose to learn a more accurate representation of…

Machine Learning · Computer Science 2024-06-05 Uthman Jinadu , Yi Ding

Robust Neural Network Classification via Double Regularization

The presence of mislabeled observations in data is a notoriously challenging problem in statistics and machine learning, associated with poor generalization properties for both traditional classifiers and, perhaps even more so, flexible…

Machine Learning · Statistics 2022-02-09 Olof Zetterqvist , Rebecka Jörnsten , Johan Jonasson

Dynamic Loss For Robust Learning

Label noise and class imbalance commonly coexist in real-world data. Previous works for robust learning, however, usually address either one type of the data biases and underperform when facing them both. To mitigate this gap, this work…

Machine Learning · Computer Science 2023-09-06 Shenwang Jiang , Jianan Li , Jizhou Zhang , Ying Wang , Tingfa Xu

Robustness and Reliability When Training With Noisy Labels

Labelling of data for supervised learning can be costly and time-consuming and the risk of incorporating label noise in large data sets is imminent. When training a flexible discriminative model using a strictly proper loss, such noise will…

Machine Learning · Statistics 2022-05-13 Amanda Olmin , Fredrik Lindsten

Learning From Noisy Singly-labeled Data

Supervised learning depends on annotated examples, which are taken to be the \emph{ground truth}. But these labels often come from noisy crowdsourcing platforms, like Amazon Mechanical Turk. Practitioners typically collect multiple labels…

Machine Learning · Computer Science 2018-05-22 Ashish Khetan , Zachary C. Lipton , Anima Anandkumar

Multi-label and Multi-target Sampling of Machine Annotation for Computational Stance Detection

Data collection from manual labeling provides domain-specific and task-aligned supervision for data-driven approaches, and a critical mass of well-annotated resources is required to achieve reasonable performance in natural language…

Computation and Language · Computer Science 2023-11-09 Zhengyuan Liu , Hai Leong Chieu , Nancy F. Chen

Robust Deep Ordinal Regression Under Label Noise

The real-world data is often susceptible to label noise, which might constrict the effectiveness of the existing state of the art algorithms for ordinal regression. Existing works on ordinal regression do not take label noise into account.…

Machine Learning · Computer Science 2020-01-28 Bhanu Garg , Naresh Manwani

Learning Image Labels On-the-fly for Training Robust Classification Models

Current deep learning paradigms largely benefit from the tremendous amount of annotated data. However, the quality of the annotations often varies among labelers. Multi-observer studies have been conducted to study these annotation…

Computer Vision and Pattern Recognition · Computer Science 2020-10-05 Xiaosong Wang , Ziyue Xu , Dong Yang , Leo Tam , Holger Roth , Daguang Xu

Meta-learning Representations for Learning from Multiple Annotators

We propose a meta-learning method for learning from multiple noisy annotators. In many applications such as crowdsourcing services, labels for supervised learning are given by multiple annotators. Since the annotators have different skills…

Machine Learning · Computer Science 2025-06-13 Atsutoshi Kumagai , Tomoharu Iwata , Taishi Nishiyama , Yasutoshi Ida , Yasuhiro Fujiwara

Leveraging Noisy Manual Labels as Useful Information: An Information Fusion Approach for Enhanced Variable Selection in Penalized Logistic Regression

In large-scale supervised learning, penalized logistic regression (PLR) effectively mitigates overfitting through regularization, yet its performance critically depends on robust variable selection. This paper demonstrates that label noise…

Machine Learning · Computer Science 2026-02-16 Xiaofei Wu , Rongmei Liangse

Generating Labels for Regression of Subjective Constructs using Triplet Embeddings

Human annotations serve an important role in computational models where the target constructs under study are hidden, such as dimensions of affect. This is especially relevant in machine learning, where subjective labels derived from…

Machine Learning · Statistics 2020-02-19 Karel Mundnich , Brandon M. Booth , Benjamin Girault , Shrikanth Narayanan

ACT as Human: Multimodal Large Language Model Data Annotation with Critical Thinking

Supervised learning relies on high-quality labeled data, but obtaining such data through human annotation is both expensive and time-consuming. Recent work explores using large language models (LLMs) for annotation, but LLM-generated labels…

Machine Learning · Computer Science 2026-03-23 Lequan Lin , Dai Shi , Andi Han , Feng Chen , Qiuzheng Chen , Jiawen Li , Zhaoyang Li , Jiyuan Li , Zhenbang Sun , Junbin Gao