Related papers: Detecting Label Errors by using Pre-Trained Langua…

Noise-Robust Fine-Tuning of Pretrained Language Models via External Guidance

Adopting a two-stage paradigm of pretraining followed by fine-tuning, Pretrained Language Models (PLMs) have achieved substantial advancements in the field of natural language processing. However, in real-world scenarios, data labels are…

Computation and Language · Computer Science 2023-11-03 Song Wang , Zhen Tan , Ruocheng Guo , Jundong Li

Error-Bounded Correction of Noisy Labels

To collect large scale annotated data, it is inevitable to introduce label noise, i.e., incorrect class labels. To be robust against label noise, many successful methods rely on the noisy classifiers (i.e., models trained on the noisy…

Computer Vision and Pattern Recognition · Computer Science 2020-11-23 Songzhu Zheng , Pengxiang Wu , Aman Goswami , Mayank Goswami , Dimitris Metaxas , Chao Chen

Analysing the Noise Model Error for Realistic Noisy Label Data

Distant and weak supervision allow to obtain large amounts of labeled training data quickly and cheaply, but these automatic annotations tend to contain a high amount of errors. A popular technique to overcome the negative effects of these…

Machine Learning · Computer Science 2021-03-02 Michael A. Hedderich , Dawei Zhu , Dietrich Klakow

NoisywikiHow: A Benchmark for Learning with Real-world Noisy Labels in Natural Language Processing

Large-scale datasets in the real world inevitably involve label noise. Deep models can gradually overfit noisy labels and thus degrade model generalization. To mitigate the effects of label noise, learning with noisy labels (LNL) methods…

Computation and Language · Computer Science 2023-05-19 Tingting Wu , Xiao Ding , Minji Tang , Hao Zhang , Bing Qin , Ting Liu

Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance

NLP benchmarks rely on standardized datasets for training and evaluating models and are crucial for advancing the field. Traditionally, expert annotations ensure high-quality labels; however, the cost of expert annotation does not scale…

Computation and Language · Computer Science 2025-09-15 Omer Nahum , Nitay Calderon , Orgad Keller , Idan Szpektor , Roi Reichart

Detecting Corrupted Labels Without Training a Model to Predict

Label noise in real-world datasets encodes wrong correlation patterns and impairs the generalization of deep neural networks (DNNs). It is critical to find efficient ways to detect corrupted patterns. Current methods primarily focus on…

Machine Learning · Computer Science 2022-06-22 Zhaowei Zhu , Zihao Dong , Yang Liu

Learning to Learn from Noisy Labeled Data

Despite the success of deep neural networks (DNNs) in image classification tasks, the human-level performance relies on massive training data with high-quality manual annotations, which are expensive and time-consuming to collect. There…

Machine Learning · Computer Science 2019-04-15 Junnan Li , Yongkang Wong , Qi Zhao , Mohan Kankanhalli

Beyond Synthetic Noise: Deep Learning on Controlled Noisy Labels

Performing controlled experiments on noisy data is essential in understanding deep learning across noise levels. Due to the lack of suitable datasets, previous research has only examined deep learning on controlled synthetic label noise,…

Machine Learning · Computer Science 2020-08-28 Lu Jiang , Di Huang , Mason Liu , Weilong Yang

Learning to Detect Noisy Labels Using Model-Based Features

Label noise is ubiquitous in various machine learning scenarios such as self-labeling with model predictions and erroneous data annotation. Many existing approaches are based on heuristics such as sample losses, which might not be flexible…

Machine Learning · Computer Science 2022-12-29 Zhihao Wang , Zongyu Lin , Peiqi Liu , Guidong ZHeng , Junjie Wen , Xianxin Chen , Yujun Chen , Zhilin Yang

Pre-train to Gain: Robust Learning Without Clean Labels

Training deep networks with noisy labels leads to poor generalization and degraded accuracy due to overfitting to label noise. Existing approaches for learning with noisy labels often rely on the availability of a clean subset of data. By…

Machine Learning · Computer Science 2025-11-27 David Szczecina , Nicholas Pellegrino , Paul Fieguth

Improving Generalization of Deep Fault Detection Models in the Presence of Mislabeled Data

Mislabeled samples are ubiquitous in real-world datasets as rule-based or expert labeling is usually based on incorrect assumptions or subject to biased opinions. Neural networks can "memorize" these mislabeled samples and, as a result,…

Machine Learning · Computer Science 2021-11-24 Katharina Rombach , Gabriel Michau , Olga Fink

Towards Robustness to Label Noise in Text Classification via Noise Modeling

Large datasets in NLP suffer from noisy labels, due to erroneous automatic and human annotation procedures. We study the problem of text classification with label noise, and aim to capture this noise through an auxiliary noise model over…

Computation and Language · Computer Science 2022-06-22 Siddhant Garg , Goutham Ramakrishnan , Varun Thumbe

Correcting Noisy Multilabel Predictions: Modeling Label Noise through Latent Space Shifts

Noise in data appears to be inevitable in most real-world machine learning applications and would cause severe overfitting problems. Not only can data features contain noise, but labels are also prone to be noisy due to human input. In this…

Machine Learning · Computer Science 2025-05-09 Weipeng Huang , Qin Li , Yang Xiao , Cheng Qiao , Tie Cai , Junwei Liang , Neil J. Hurley , Guangyuan Piao

Deep Learning is Robust to Massive Label Noise

Deep neural networks trained on large supervised datasets have led to impressive results in image classification and other tasks. However, well-annotated datasets can be time-consuming and expensive to collect, lending increased interest to…

Machine Learning · Computer Science 2018-02-27 David Rolnick , Andreas Veit , Serge Belongie , Nir Shavit

NoiseBench: Benchmarking the Impact of Real Label Noise on Named Entity Recognition

Available training data for named entity recognition (NER) often contains a significant percentage of incorrect labels for entity types and entity boundaries. Such label noise poses challenges for supervised learning and may significantly…

Computation and Language · Computer Science 2024-10-15 Elena Merdjanovska , Ansar Aynetdinov , Alan Akbik

Mitigating Instance-Dependent Label Noise: Integrating Self-Supervised Pretraining with Pseudo-Label Refinement

Deep learning models rely heavily on large volumes of labeled data to achieve high performance. However, real-world datasets often contain noisy labels due to human error, ambiguity, or resource constraints during the annotation process.…

Computer Vision and Pattern Recognition · Computer Science 2024-12-09 Gouranga Bala , Anuj Gupta , Subrat Kumar Behera , Amit Sethi

Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise

The growing importance of massive datasets used for deep learning makes robustness to label noise a critical property for classifiers to have. Sources of label noise include automatic labeling, non-expert labeling, and label corruption by…

Machine Learning · Computer Science 2019-01-30 Dan Hendrycks , Mantas Mazeika , Duncan Wilson , Kevin Gimpel

Learning to Purify Noisy Labels via Meta Soft Label Corrector

Recent deep neural networks (DNNs) can easily overfit to biased training data with noisy labels. Label correction strategy is commonly used to alleviate this issue by designing a method to identity suspected noisy labels and then correct…

Computer Vision and Pattern Recognition · Computer Science 2021-09-03 Yichen Wu , Jun Shu , Qi Xie , Qian Zhao , Deyu Meng

Learning from Training Dynamics: Identifying Mislabeled Data Beyond Manually Designed Features

While mislabeled or ambiguously-labeled samples in the training set could negatively affect the performance of deep models, diagnosing the dataset and identifying mislabeled samples helps to improve the generalization power. Training…

Computer Vision and Pattern Recognition · Computer Science 2022-12-21 Qingrui Jia , Xuhong Li , Lei Yu , Jiang Bian , Penghao Zhao , Shupeng Li , Haoyi Xiong , Dejing Dou

Predictions For Pre-training Language Models

Language model pre-training has proven to be useful in many language understanding tasks. In this paper, we investigate whether it is still helpful to add the self-training method in the pre-training step and the fine-tuning step. Towards…

Computation and Language · Computer Science 2023-02-17 Tong Guo