Related papers: Robust Learning for Text Classification with Multi…

Empirical Error Modeling Improves Robustness of Noisy Neural Sequence Labeling

Despite recent advances, standard sequence labeling systems often fail when processing noisy user-generated text or consuming the output of an Optical Character Recognition (OCR) process. In this paper, we improve the noise-aware training…

Computation and Language · Computer Science 2021-05-26 Marcin Namysl , Sven Behnke , Joachim Köhler

Understanding Model Robustness to User-generated Noisy Texts

Sensitivity of deep-neural models to input noise is known to be a challenging problem. In NLP, model performance often deteriorates with naturally occurring noise, such as spelling errors. To mitigate this issue, models may leverage…

Computation and Language · Computer Science 2021-11-18 Jakub Náplava , Martin Popel , Milan Straka , Jana Straková

Noisy Parallel Data Alignment

An ongoing challenge in current natural language processing is how its major advancements tend to disproportionately favor resource-rich languages, leaving a significant number of under-resourced languages behind. Due to the lack of…

Computation and Language · Computer Science 2023-02-13 Ruoyu Xie , Antonios Anastasopoulos

Task-Adaptive Pre-Training for Boosting Learning With Noisy Labels: A Study on Text Classification for African Languages

For high-resource languages like English, text classification is a well-studied task. The performance of modern NLP models easily achieves an accuracy of more than 90% in many standard datasets for text classification in English (Xie et…

Computation and Language · Computer Science 2022-06-06 Dawei Zhu , Michael A. Hedderich , Fangzhou Zhai , David Ifeoluwa Adelani , Dietrich Klakow

Robustification of Multilingual Language Models to Real-world Noise in Crosslingual Zero-shot Settings with Robust Contrastive Pretraining

Advances in neural modeling have achieved state-of-the-art (SOTA) results on public natural language processing (NLP) benchmarks, at times surpassing human performance. However, there is a gap between public benchmarks and real-world…

Computation and Language · Computer Science 2023-02-14 Asa Cooper Stickland , Sailik Sengupta , Jason Krone , Saab Mansour , He He

Learning Robust Named Entity Recognizers From Noisy Data With Retrieval Augmentation

Named entity recognition (NER) models often struggle with noisy inputs, such as those with spelling mistakes or errors generated by Optical Character Recognition processes, and learning a robust NER model is challenging. Existing robust NER…

Computation and Language · Computer Science 2024-07-29 Chaoyi Ai , Yong Jiang , Shen Huang , Pengjun Xie , Kewei Tu

Towards Robustness to Label Noise in Text Classification via Noise Modeling

Large datasets in NLP suffer from noisy labels, due to erroneous automatic and human annotation procedures. We study the problem of text classification with label noise, and aim to capture this noise through an auxiliary noise model over…

Computation and Language · Computer Science 2022-06-22 Siddhant Garg , Goutham Ramakrishnan , Varun Thumbe

An Effective Label Noise Model for DNN Text Classification

Because large, human-annotated datasets suffer from labeling errors, it is crucial to be able to train deep neural networks in the presence of label noise. While training image classification models with label noise have received much…

Machine Learning · Computer Science 2019-03-19 Ishan Jindal , Daniel Pressel , Brian Lester , Matthew Nokleby

Deep learning models are not robust against noise in clinical text

Artificial Intelligence (AI) systems are attracting increasing interest in the medical domain due to their ability to learn complicated tasks that require human intelligence and expert knowledge. AI systems that utilize high-performance…

Computation and Language · Computer Science 2021-08-30 Milad Moradi , Kathrin Blagec , Matthias Samwald

Learning to Retrieve with Weakened Labels: Robust Training under Label Noise

Neural Encoders are frequently used in the NLP domain to perform dense retrieval tasks, for instance, to generate the candidate documents for a given query in question-answering tasks. However, sparse annotation and label noise in the…

Machine Learning · Computer Science 2025-12-16 Arnab Sharma

An Empirical Study on Noisy Label Learning for Program Understanding

Recently, deep learning models have been widely applied in program understanding tasks, and these models achieve state-of-the-art results on many benchmark datasets. A major challenge of deep learning for program understanding is that the…

Software Engineering · Computer Science 2024-01-02 Wenhan Wang , Yanzhou Li , Anran Li , Jian Zhang , Wei Ma , Yang Liu

Robust Feature Learning Against Noisy Labels

Supervised learning of deep neural networks heavily relies on large-scale datasets annotated by high-quality labels. In contrast, mislabeled samples can significantly degrade the generalization of models and result in memorizing samples,…

Computer Vision and Pattern Recognition · Computer Science 2023-07-11 Tsung-Ming Tai , Yun-Jie Jhang , Wen-Jyi Hwang

Robustness and Reliability When Training With Noisy Labels

Labelling of data for supervised learning can be costly and time-consuming and the risk of incorporating label noise in large data sets is imminent. When training a flexible discriminative model using a strictly proper loss, such noise will…

Machine Learning · Statistics 2022-05-13 Amanda Olmin , Fredrik Lindsten

Learning Noise-Invariant Representations for Robust Speech Recognition

Despite rapid advances in speech recognition, current models remain brittle to superficial perturbations to their inputs. Small amounts of noise can destroy the performance of an otherwise state-of-the-art model. To harden models against…

Audio and Speech Processing · Electrical Eng. & Systems 2018-07-19 Davis Liang , Zhiheng Huang , Zachary C. Lipton

Robust Testing for Deep Learning using Human Label Noise

In deep learning (DL) systems, label noise in training datasets often degrades model performance, as models may learn incorrect patterns from mislabeled data. The area of Learning with Noisy Labels (LNL) has introduced methods to…

Machine Learning · Computer Science 2024-12-03 Gordon Lim , Stefan Larson , Kevin Leach

On the Noise Robustness of In-Context Learning for Text Generation

Large language models (LLMs) have shown impressive performance on downstream tasks by in-context learning (ICL), which heavily relies on the quality of demonstrations selected from a large set of annotated examples. Recent works claim that…

Computation and Language · Computer Science 2024-10-25 Hongfu Gao , Feipeng Zhang , Wenyu Jiang , Jun Shu , Feng Zheng , Hongxin Wei

Noise tolerance of learning to rank under class-conditional label noise

Often, the data used to train ranking models is subject to label noise. For example, in web-search, labels created from clickstream data are noisy due to issues such as insufficient information in item descriptions on the SERP, query…

Information Retrieval · Computer Science 2022-08-18 Dany Haddad

A Survey of Label-noise Representation Learning: Past, Present and Future

Classical machine learning implicitly assumes that labels of the training data are sampled from a clean distribution, which can be too restrictive for real-world scenarios. However, statistical-learning-based methods may not train deep…

Machine Learning · Computer Science 2021-02-23 Bo Han , Quanming Yao , Tongliang Liu , Gang Niu , Ivor W. Tsang , James T. Kwok , Masashi Sugiyama

Can LLMs Learn to Reason Robustly under Noisy Supervision?

Reinforcement Learning with Verifiable Rewards (RLVR) effectively trains reasoning models that rely on abundant perfect labels, but its vulnerability to unavoidable noisy labels due to expert scarcity remains critically underexplored. In…

Machine Learning · Computer Science 2026-04-07 Shenzhi Yang , Guangcheng Zhu , Bowen Song , Sharon Li , Haobo Wang , Xing Zheng , Yingfan Ma , Zhongqi Chen , Weiqiang Wang , Gang Chen

Image Classification with Deep Learning in the Presence of Noisy Labels: A Survey

Image classification systems recently made a giant leap with the advancement of deep neural networks. However, these systems require an excessive amount of labeled data to be adequately trained. Gathering a correctly annotated dataset is…

Machine Learning · Computer Science 2021-01-19 Görkem Algan , Ilkay Ulusoy