Related papers: Detecting Label Errors in Token Classification Dat…

Learning to Detect Label Errors by Making Them: A Method for Segmentation and Object Detection Datasets

Recently, detection of label errors and improvement of label quality in datasets for supervised learning tasks has become an increasingly important goal in both research and industry. The consequences of incorrectly annotated data include…

Machine Learning · Computer Science 2025-08-26 Sarina Penquitt , Tobias Riedlinger , Timo Heller , Markus Reischl , Matthias Rottmann

Identifying Label Errors in Object Detection Datasets by Loss Inspection

Labeling datasets for supervised object detection is a dull and time-consuming task. Errors can be easily introduced during annotation and overlooked during review, yielding inaccurate benchmarks and performance degradation of deep neural…

Computer Vision and Pattern Recognition · Computer Science 2023-12-20 Marius Schubert , Tobias Riedlinger , Karsten Kahl , Daniel Kröll , Sebastian Schoenen , Siniša Šegvić , Matthias Rottmann

Identifying Incorrect Annotations in Multi-Label Classification Data

In multi-label classification, each example in a dataset may be annotated as belonging to one or more classes (or none of the classes). Example applications include image (or document) tagging where each possible tag either applies to a…

Machine Learning · Computer Science 2022-11-28 Aditya Thyagarajan , Elías Snorrason , Curtis Northcutt , Jonas Mueller

Jointly Learning to Label Sentences and Tokens

Learning to construct text representations in end-to-end systems can be difficult, as natural languages are highly compositional and task-specific annotated datasets are often limited in size. Methods for directly supervising language…

Computation and Language · Computer Science 2018-11-15 Marek Rei , Anders Søgaard

SELECT: Detecting Label Errors in Real-world Scene Text Data

We introduce SELECT (Scene tExt Label Errors deteCTion), a novel approach that leverages multi-modal training to detect label errors in real-world scene text datasets. Utilizing an image-text encoder and a character-level tokenizer, SELECT…

Computer Vision and Pattern Recognition · Computer Science 2025-12-17 Wenjun Liu , Qian Wu , Yifeng Hu , Yuke Li

Estimating label quality and errors in semantic segmentation data via any model

The labor-intensive annotation process of semantic segmentation datasets is often prone to errors, since humans struggle to label every pixel correctly. We study algorithms to automatically detect such annotation errors, in particular…

Machine Learning · Computer Science 2023-07-12 Vedang Lad , Jonas Mueller

Identifying Mislabeled Instances in Classification Datasets

A key requirement for supervised machine learning is labeled training data, which is created by annotating unlabeled data with the appropriate class. Because this process can in many cases not be done by machines, labeling needs to be…

Machine Learning · Computer Science 2019-12-12 Nicolas Michael Müller , Karla Markert

Zero-shot Sequence Labeling: Transferring Knowledge from Sentences to Tokens

Can attention- or gradient-based visualization techniques be used to infer token-level labels for binary sequence tagging problems, using networks trained only on sentence-level labels? We construct a neural network architecture based on…

Computation and Language · Computer Science 2018-05-08 Marek Rei , Anders Søgaard

Mislabeled examples detection viewed as probing machine learning models: concepts, survey and extensive benchmark

Mislabeled examples are ubiquitous in real-world machine learning datasets, advocating the development of techniques for automatic detection. We show that most mislabeled detection methods can be viewed as probing trained machine learning…

Machine Learning · Computer Science 2024-10-22 Thomas George , Pierre Nodet , Alexis Bondu , Vincent Lemaire

Automated Detection of Label Errors in Semantic Segmentation Datasets via Deep Learning and Uncertainty Quantification

In this work, we for the first time present a method for detecting label errors in image datasets with semantic segmentation, i.e., pixel-wise class labels. Annotation acquisition for semantic segmentation datasets is time-consuming and…

Computer Vision and Pattern Recognition · Computer Science 2024-08-27 Matthias Rottmann , Marco Reese

Enhanced Sample Selection with Confidence Tracking: Identifying Correctly Labeled yet Hard-to-Learn Samples in Noisy Data

We propose a novel sample selection method for image classification in the presence of noisy labels. Existing methods typically consider small-loss samples as correctly labeled. However, some correctly labeled samples are inherently…

Computer Vision and Pattern Recognition · Computer Science 2025-04-25 Weiran Pan , Wei Wei , Feida Zhu , Yong Deng

Improve Text Classification Accuracy with Intent Information

Text classification, a core component of task-oriented dialogue systems, attracts continuous research from both the research and industry community, and has resulted in tremendous progress. However, existing method does not consider the use…

Computation and Language · Computer Science 2022-12-16 Yifeng Xie

ObjectLab: Automated Diagnosis of Mislabeled Images in Object Detection Data

Despite powering sensitive systems like autonomous vehicles, object detection remains fairly brittle in part due to annotation errors that plague most real-world training datasets. We propose ObjectLab, a straightforward algorithm to detect…

Computer Vision and Pattern Recognition · Computer Science 2023-09-06 Ulyana Tkachenko , Aditya Thyagarajan , Jonas Mueller

Annotation Error Detection: Analyzing the Past and Present for a More Coherent Future

Annotated data is an essential ingredient in natural language processing for training and evaluating machine learning models. It is therefore very desirable for the annotations to be of high quality. Recent work, however, has shown that…

Computation and Language · Computer Science 2022-09-27 Jan-Christoph Klie , Bonnie Webber , Iryna Gurevych

Regularization Through Reasoning: Systematic Improvements in Language Model Classification via Explanation-Enhanced Fine-Tuning

Fine-tuning LLMs for classification typically maps inputs directly to labels. We ask whether attaching brief explanations to each label during fine-tuning yields better models. We evaluate conversational response quality along three axes:…

Machine Learning · Computer Science 2026-03-03 Vivswan Shah , Randy Cogill , Hanwei Yue , Gopinath Chennupati , Rinat Khaziev

Label Confusion Learning to Enhance Text Classification Models

Representing a true label as a one-hot vector is a common practice in training text classification models. However, the one-hot representation may not adequately reflect the relation between the instances and labels, as labels are often not…

Computation and Language · Computer Science 2020-12-10 Biyang Guo , Songqiao Han , Xiao Han , Hailiang Huang , Ting Lu

Text Classification Using Label Names Only: A Language Model Self-Training Approach

Current text classification methods typically require a good number of human-labeled documents as training data, which can be costly and difficult to obtain in real applications. Humans can perform classification without seeing any labeled…

Computation and Language · Computer Science 2020-10-15 Yu Meng , Yunyi Zhang , Jiaxin Huang , Chenyan Xiong , Heng Ji , Chao Zhang , Jiawei Han

Learning from Stochastic Labels

Annotating multi-class instances is a crucial task in the field of machine learning. Unfortunately, identifying the correct class label from a long sequence of candidate labels is time-consuming and laborious. To alleviate this problem, we…

Machine Learning · Computer Science 2025-12-05 Meng Wei , Zhongnian Li , Yong Zhou , Qiaoyu Guo , Xinzheng Xu

Towards Robustness to Label Noise in Text Classification via Noise Modeling

Large datasets in NLP suffer from noisy labels, due to erroneous automatic and human annotation procedures. We study the problem of text classification with label noise, and aim to capture this noise through an auxiliary noise model over…

Computation and Language · Computer Science 2022-06-22 Siddhant Garg , Goutham Ramakrishnan , Varun Thumbe

Are Labels Always Necessary for Classifier Accuracy Evaluation?

To calculate the model accuracy on a computer vision task, e.g., object recognition, we usually require a test set composing of test samples and their ground truth labels. Whilst standard usage cases satisfy this requirement, many…

Computer Vision and Pattern Recognition · Computer Science 2021-05-26 Weijian Deng , Liang Zheng