English
Related papers

Related papers: Detecting Label Errors in Token Classification Dat…

200 papers

Recently, detection of label errors and improvement of label quality in datasets for supervised learning tasks has become an increasingly important goal in both research and industry. The consequences of incorrectly annotated data include…

Machine Learning · Computer Science 2025-08-26 Sarina Penquitt , Tobias Riedlinger , Timo Heller , Markus Reischl , Matthias Rottmann

Labeling datasets for supervised object detection is a dull and time-consuming task. Errors can be easily introduced during annotation and overlooked during review, yielding inaccurate benchmarks and performance degradation of deep neural…

Computer Vision and Pattern Recognition · Computer Science 2023-12-20 Marius Schubert , Tobias Riedlinger , Karsten Kahl , Daniel Kröll , Sebastian Schoenen , Siniša Šegvić , Matthias Rottmann

In multi-label classification, each example in a dataset may be annotated as belonging to one or more classes (or none of the classes). Example applications include image (or document) tagging where each possible tag either applies to a…

Machine Learning · Computer Science 2022-11-28 Aditya Thyagarajan , Elías Snorrason , Curtis Northcutt , Jonas Mueller

Learning to construct text representations in end-to-end systems can be difficult, as natural languages are highly compositional and task-specific annotated datasets are often limited in size. Methods for directly supervising language…

Computation and Language · Computer Science 2018-11-15 Marek Rei , Anders Søgaard

We introduce SELECT (Scene tExt Label Errors deteCTion), a novel approach that leverages multi-modal training to detect label errors in real-world scene text datasets. Utilizing an image-text encoder and a character-level tokenizer, SELECT…

Computer Vision and Pattern Recognition · Computer Science 2025-12-17 Wenjun Liu , Qian Wu , Yifeng Hu , Yuke Li

The labor-intensive annotation process of semantic segmentation datasets is often prone to errors, since humans struggle to label every pixel correctly. We study algorithms to automatically detect such annotation errors, in particular…

Machine Learning · Computer Science 2023-07-12 Vedang Lad , Jonas Mueller

A key requirement for supervised machine learning is labeled training data, which is created by annotating unlabeled data with the appropriate class. Because this process can in many cases not be done by machines, labeling needs to be…

Machine Learning · Computer Science 2019-12-12 Nicolas Michael Müller , Karla Markert

Can attention- or gradient-based visualization techniques be used to infer token-level labels for binary sequence tagging problems, using networks trained only on sentence-level labels? We construct a neural network architecture based on…

Computation and Language · Computer Science 2018-05-08 Marek Rei , Anders Søgaard

Mislabeled examples are ubiquitous in real-world machine learning datasets, advocating the development of techniques for automatic detection. We show that most mislabeled detection methods can be viewed as probing trained machine learning…

Machine Learning · Computer Science 2024-10-22 Thomas George , Pierre Nodet , Alexis Bondu , Vincent Lemaire

In this work, we for the first time present a method for detecting label errors in image datasets with semantic segmentation, i.e., pixel-wise class labels. Annotation acquisition for semantic segmentation datasets is time-consuming and…

Computer Vision and Pattern Recognition · Computer Science 2024-08-27 Matthias Rottmann , Marco Reese

We propose a novel sample selection method for image classification in the presence of noisy labels. Existing methods typically consider small-loss samples as correctly labeled. However, some correctly labeled samples are inherently…

Computer Vision and Pattern Recognition · Computer Science 2025-04-25 Weiran Pan , Wei Wei , Feida Zhu , Yong Deng

Text classification, a core component of task-oriented dialogue systems, attracts continuous research from both the research and industry community, and has resulted in tremendous progress. However, existing method does not consider the use…

Computation and Language · Computer Science 2022-12-16 Yifeng Xie

Despite powering sensitive systems like autonomous vehicles, object detection remains fairly brittle in part due to annotation errors that plague most real-world training datasets. We propose ObjectLab, a straightforward algorithm to detect…

Computer Vision and Pattern Recognition · Computer Science 2023-09-06 Ulyana Tkachenko , Aditya Thyagarajan , Jonas Mueller

Annotated data is an essential ingredient in natural language processing for training and evaluating machine learning models. It is therefore very desirable for the annotations to be of high quality. Recent work, however, has shown that…

Computation and Language · Computer Science 2022-09-27 Jan-Christoph Klie , Bonnie Webber , Iryna Gurevych

Fine-tuning LLMs for classification typically maps inputs directly to labels. We ask whether attaching brief explanations to each label during fine-tuning yields better models. We evaluate conversational response quality along three axes:…

Machine Learning · Computer Science 2026-03-03 Vivswan Shah , Randy Cogill , Hanwei Yue , Gopinath Chennupati , Rinat Khaziev

Representing a true label as a one-hot vector is a common practice in training text classification models. However, the one-hot representation may not adequately reflect the relation between the instances and labels, as labels are often not…

Computation and Language · Computer Science 2020-12-10 Biyang Guo , Songqiao Han , Xiao Han , Hailiang Huang , Ting Lu

Current text classification methods typically require a good number of human-labeled documents as training data, which can be costly and difficult to obtain in real applications. Humans can perform classification without seeing any labeled…

Computation and Language · Computer Science 2020-10-15 Yu Meng , Yunyi Zhang , Jiaxin Huang , Chenyan Xiong , Heng Ji , Chao Zhang , Jiawei Han

Annotating multi-class instances is a crucial task in the field of machine learning. Unfortunately, identifying the correct class label from a long sequence of candidate labels is time-consuming and laborious. To alleviate this problem, we…

Machine Learning · Computer Science 2025-12-05 Meng Wei , Zhongnian Li , Yong Zhou , Qiaoyu Guo , Xinzheng Xu

Large datasets in NLP suffer from noisy labels, due to erroneous automatic and human annotation procedures. We study the problem of text classification with label noise, and aim to capture this noise through an auxiliary noise model over…

Computation and Language · Computer Science 2022-06-22 Siddhant Garg , Goutham Ramakrishnan , Varun Thumbe

To calculate the model accuracy on a computer vision task, e.g., object recognition, we usually require a test set composing of test samples and their ground truth labels. Whilst standard usage cases satisfy this requirement, many…

Computer Vision and Pattern Recognition · Computer Science 2021-05-26 Weijian Deng , Liang Zheng
‹ Prev 1 2 3 10 Next ›