English
Related papers

Related papers: Identifying Mislabeled Training Data

200 papers

Removing or filtering outliers and mislabeled instances prior to training a learning algorithm has been shown to increase classification accuracy. A popular approach for handling outliers and mislabeled instances is to remove any instance…

Machine Learning · Computer Science 2013-12-17 Michael R. Smith , Tony Martinez

A key requirement for supervised machine learning is labeled training data, which is created by annotating unlabeled data with the appropriate class. Because this process can in many cases not be done by machines, labeling needs to be…

Machine Learning · Computer Science 2019-12-12 Nicolas Michael Müller , Karla Markert

We investigate the problem of machine learning with mislabeled training data. We try to make the effects of mislabeled training better understood through analysis of the basic model and equations that characterize the problem. This includes…

Machine Learning · Computer Science 2019-09-23 Herbert Gish , Jan Silovsky , Man-Ling Sung , Man-Hung Siu , William Hartmann , Zhuolin Jiang

Training data plays an essential role in modern applications of machine learning. However, gathering labeled training data is time-consuming. Therefore, labeling is often outsourced to less experienced users, or completely automated. This…

Computer Vision and Pattern Recognition · Computer Science 2020-06-11 Alex Bäuerle , Heiko Neumann , Timo Ropinski

Due to the over-emphasize of the quantity of data, the data quality has often been overlooked. However, not all training data points contribute equally to learning. In particular, if mislabeled, it might actively damage the performance of…

Machine Learning · Computer Science 2021-09-13 Vaibhav Pulastya , Gaurav Nuti , Yash Kumar Atri , Tanmoy Chakraborty

While mislabeled or ambiguously-labeled samples in the training set could negatively affect the performance of deep models, diagnosing the dataset and identifying mislabeled samples helps to improve the generalization power. Training…

Computer Vision and Pattern Recognition · Computer Science 2022-12-21 Qingrui Jia , Xuhong Li , Lei Yu , Jiang Bian , Penghao Zhao , Shupeng Li , Haoyi Xiong , Dejing Dou

Mislabeled samples are ubiquitous in real-world datasets as rule-based or expert labeling is usually based on incorrect assumptions or subject to biased opinions. Neural networks can "memorize" these mislabeled samples and, as a result,…

Machine Learning · Computer Science 2021-11-24 Katharina Rombach , Gabriel Michau , Olga Fink

To collect large scale annotated data, it is inevitable to introduce label noise, i.e., incorrect class labels. To be robust against label noise, many successful methods rely on the noisy classifiers (i.e., models trained on the noisy…

Computer Vision and Pattern Recognition · Computer Science 2020-11-23 Songzhu Zheng , Pengxiang Wu , Aman Goswami , Mayank Goswami , Dimitris Metaxas , Chao Chen

In many domains, collecting sufficient labeled training data for supervised machine learning requires easily accessible but noisy sources, such as crowdsourcing services or tagged Web data. Noisy labels occur frequently in data sets…

Machine Learning · Computer Science 2018-11-16 Matthew Klawonn , Eric Heim , James Hendler

Learning exists in the context of data, yet notions of confidence typically focus on model predictions, not label quality. Confident learning (CL) is an alternative approach which focuses instead on label quality by characterizing and…

Machine Learning · Statistics 2022-08-23 Curtis G. Northcutt , Lu Jiang , Isaac L. Chuang

We approach the problem of improving robustness of deep learning algorithms in the presence of label noise. Building upon existing label correction and co-teaching methods, we propose a novel training procedure to mitigate the memorization…

Computer Vision and Pattern Recognition · Computer Science 2023-04-27 Jihye Kim , Aristide Baratin , Yan Zhang , Simon Lacoste-Julien

Deep neural networks (DNNs) have been shown to over-fit a dataset when being trained with noisy labels for a long enough time. To overcome this problem, we present a simple and effective method self-ensemble label filtering (SELF) to…

Computer Vision and Pattern Recognition · Computer Science 2019-10-07 Duc Tam Nguyen , Chaithanya Kumar Mummadi , Thi Phuong Nhung Ngo , Thi Hoai Phuong Nguyen , Laura Beggel , Thomas Brox

Distant supervision provides a means to create a large number of weakly labeled data at low cost for relation classification. However, the resulting labeled instances are very noisy, containing data with wrong labels. Many approaches have…

Computation and Language · Computer Science 2020-10-27 Zhenzhen Li , Jian-Yun Nie , Benyou Wang , Pan Du , Yuhan Zhang , Lixin Zou , Dongsheng Li

Mislabeled data is a pervasive issue that undermines the performance of machine learning systems in real-world applications. An effective approach to mitigate this problem is to detect mislabeled instances and subject them to special…

Machine Learning · Computer Science 2025-11-05 Ilies Chibane , Thomas George , Pierre Nodet , Vincent Lemaire

Falsely annotated samples, also known as noisy labels, can significantly harm the performance of deep learning models. Two main approaches for learning with noisy labels are global noise estimation and data filtering. Global noise…

Machine Learning · Computer Science 2025-07-31 Yuval Grinberg , Nimrod Harel , Jacob Goldberger , Ofir Lindenbaum

In many applications, training machine learning models involves using large amounts of human-annotated data. Obtaining precise labels for the data is expensive. Instead, training with weak supervision provides a low-cost alternative. We…

Machine Learning · Computer Science 2022-02-09 Chidubem Arachie , Bert Huang

Semi-supervised learning lately has shown much promise in improving deep learning models when labeled data is scarce. Common among recent approaches is the use of consistency training on a large amount of unlabeled data to constrain model…

Machine Learning · Computer Science 2020-11-06 Qizhe Xie , Zihang Dai , Eduard Hovy , Minh-Thang Luong , Quoc V. Le

In learning with noisy labels, the sample selection approach is very popular, which regards small-loss data as correctly labeled during training. However, losses are generated on-the-fly based on the model being trained with noisy labels,…

Machine Learning · Computer Science 2021-06-02 Xiaobo Xia , Tongliang Liu , Bo Han , Mingming Gong , Jun Yu , Gang Niu , Masashi Sugiyama

Semi-supervised learning methods are motivated by the availability of large datasets with unlabeled features in addition to labeled data. Unlabeled data is, however, not guaranteed to improve classification performance and has in fact been…

Machine Learning · Statistics 2019-10-25 Xiuming Liu , Dave Zachariah , Johan Wågberg , Thomas B. Schön

We consider the problem of training a model under the presence of label noise. Current approaches identify samples with potentially incorrect labels and reduce their influence on the learning process by either assigning lower weights to…

Machine Learning · Computer Science 2019-06-04 Duc Tam Nguyen , Thi-Phuong-Nhung Ngo , Zhongyu Lou , Michael Klar , Laura Beggel , Thomas Brox
‹ Prev 1 2 3 10 Next ›