English
Related papers

Related papers: Detecting Label Errors by using Pre-Trained Langua…

200 papers

Adopting a two-stage paradigm of pretraining followed by fine-tuning, Pretrained Language Models (PLMs) have achieved substantial advancements in the field of natural language processing. However, in real-world scenarios, data labels are…

Computation and Language · Computer Science 2023-11-03 Song Wang , Zhen Tan , Ruocheng Guo , Jundong Li

To collect large scale annotated data, it is inevitable to introduce label noise, i.e., incorrect class labels. To be robust against label noise, many successful methods rely on the noisy classifiers (i.e., models trained on the noisy…

Computer Vision and Pattern Recognition · Computer Science 2020-11-23 Songzhu Zheng , Pengxiang Wu , Aman Goswami , Mayank Goswami , Dimitris Metaxas , Chao Chen

Distant and weak supervision allow to obtain large amounts of labeled training data quickly and cheaply, but these automatic annotations tend to contain a high amount of errors. A popular technique to overcome the negative effects of these…

Machine Learning · Computer Science 2021-03-02 Michael A. Hedderich , Dawei Zhu , Dietrich Klakow

Large-scale datasets in the real world inevitably involve label noise. Deep models can gradually overfit noisy labels and thus degrade model generalization. To mitigate the effects of label noise, learning with noisy labels (LNL) methods…

Computation and Language · Computer Science 2023-05-19 Tingting Wu , Xiao Ding , Minji Tang , Hao Zhang , Bing Qin , Ting Liu

NLP benchmarks rely on standardized datasets for training and evaluating models and are crucial for advancing the field. Traditionally, expert annotations ensure high-quality labels; however, the cost of expert annotation does not scale…

Computation and Language · Computer Science 2025-09-15 Omer Nahum , Nitay Calderon , Orgad Keller , Idan Szpektor , Roi Reichart

Label noise in real-world datasets encodes wrong correlation patterns and impairs the generalization of deep neural networks (DNNs). It is critical to find efficient ways to detect corrupted patterns. Current methods primarily focus on…

Machine Learning · Computer Science 2022-06-22 Zhaowei Zhu , Zihao Dong , Yang Liu

Despite the success of deep neural networks (DNNs) in image classification tasks, the human-level performance relies on massive training data with high-quality manual annotations, which are expensive and time-consuming to collect. There…

Machine Learning · Computer Science 2019-04-15 Junnan Li , Yongkang Wong , Qi Zhao , Mohan Kankanhalli

Performing controlled experiments on noisy data is essential in understanding deep learning across noise levels. Due to the lack of suitable datasets, previous research has only examined deep learning on controlled synthetic label noise,…

Machine Learning · Computer Science 2020-08-28 Lu Jiang , Di Huang , Mason Liu , Weilong Yang

Label noise is ubiquitous in various machine learning scenarios such as self-labeling with model predictions and erroneous data annotation. Many existing approaches are based on heuristics such as sample losses, which might not be flexible…

Machine Learning · Computer Science 2022-12-29 Zhihao Wang , Zongyu Lin , Peiqi Liu , Guidong ZHeng , Junjie Wen , Xianxin Chen , Yujun Chen , Zhilin Yang

Training deep networks with noisy labels leads to poor generalization and degraded accuracy due to overfitting to label noise. Existing approaches for learning with noisy labels often rely on the availability of a clean subset of data. By…

Machine Learning · Computer Science 2025-11-27 David Szczecina , Nicholas Pellegrino , Paul Fieguth

Mislabeled samples are ubiquitous in real-world datasets as rule-based or expert labeling is usually based on incorrect assumptions or subject to biased opinions. Neural networks can "memorize" these mislabeled samples and, as a result,…

Machine Learning · Computer Science 2021-11-24 Katharina Rombach , Gabriel Michau , Olga Fink

Large datasets in NLP suffer from noisy labels, due to erroneous automatic and human annotation procedures. We study the problem of text classification with label noise, and aim to capture this noise through an auxiliary noise model over…

Computation and Language · Computer Science 2022-06-22 Siddhant Garg , Goutham Ramakrishnan , Varun Thumbe

Noise in data appears to be inevitable in most real-world machine learning applications and would cause severe overfitting problems. Not only can data features contain noise, but labels are also prone to be noisy due to human input. In this…

Machine Learning · Computer Science 2025-05-09 Weipeng Huang , Qin Li , Yang Xiao , Cheng Qiao , Tie Cai , Junwei Liang , Neil J. Hurley , Guangyuan Piao

Deep neural networks trained on large supervised datasets have led to impressive results in image classification and other tasks. However, well-annotated datasets can be time-consuming and expensive to collect, lending increased interest to…

Machine Learning · Computer Science 2018-02-27 David Rolnick , Andreas Veit , Serge Belongie , Nir Shavit

Available training data for named entity recognition (NER) often contains a significant percentage of incorrect labels for entity types and entity boundaries. Such label noise poses challenges for supervised learning and may significantly…

Computation and Language · Computer Science 2024-10-15 Elena Merdjanovska , Ansar Aynetdinov , Alan Akbik

Deep learning models rely heavily on large volumes of labeled data to achieve high performance. However, real-world datasets often contain noisy labels due to human error, ambiguity, or resource constraints during the annotation process.…

Computer Vision and Pattern Recognition · Computer Science 2024-12-09 Gouranga Bala , Anuj Gupta , Subrat Kumar Behera , Amit Sethi

The growing importance of massive datasets used for deep learning makes robustness to label noise a critical property for classifiers to have. Sources of label noise include automatic labeling, non-expert labeling, and label corruption by…

Machine Learning · Computer Science 2019-01-30 Dan Hendrycks , Mantas Mazeika , Duncan Wilson , Kevin Gimpel

Recent deep neural networks (DNNs) can easily overfit to biased training data with noisy labels. Label correction strategy is commonly used to alleviate this issue by designing a method to identity suspected noisy labels and then correct…

Computer Vision and Pattern Recognition · Computer Science 2021-09-03 Yichen Wu , Jun Shu , Qi Xie , Qian Zhao , Deyu Meng

While mislabeled or ambiguously-labeled samples in the training set could negatively affect the performance of deep models, diagnosing the dataset and identifying mislabeled samples helps to improve the generalization power. Training…

Computer Vision and Pattern Recognition · Computer Science 2022-12-21 Qingrui Jia , Xuhong Li , Lei Yu , Jiang Bian , Penghao Zhao , Shupeng Li , Haoyi Xiong , Dejing Dou

Language model pre-training has proven to be useful in many language understanding tasks. In this paper, we investigate whether it is still helpful to add the self-training method in the pre-training step and the fine-tuning step. Towards…

Computation and Language · Computer Science 2023-02-17 Tong Guo
‹ Prev 1 2 3 10 Next ›