Related papers: Efficient Failure Pattern Identification of Predic…

Identifying Wrongly Predicted Samples: A Method for Active Learning

State-of-the-art machine learning models require access to significant amount of annotated data in order to achieve the desired level of performance. While unlabelled data can be largely available and even abundant, annotation process can…

Machine Learning · Computer Science 2020-10-15 Rahaf Aljundi , Nikolay Chumerin , Daniel Olmeda Reino

Ensemble Learning Based Classification Algorithm Recommendation

Recommending appropriate algorithms to a classification problem is one of the most challenging issues in the field of data mining. The existing algorithm recommendation models are generally constructed on only one kind of meta-features by…

Information Retrieval · Computer Science 2021-06-08 Guangtao Wang , Qinbao Song , Xiaoyan Zhu

Identifying Mislabeled Instances in Classification Datasets

A key requirement for supervised machine learning is labeled training data, which is created by annotating unlabeled data with the appropriate class. Because this process can in many cases not be done by machines, labeling needs to be…

Machine Learning · Computer Science 2019-12-12 Nicolas Michael Müller , Karla Markert

Towards Automated Negative Sampling in Implicit Recommendation

Negative sampling methods are vital in implicit recommendation models as they allow us to obtain negative instances from massive unlabeled data. Most existing approaches focus on sampling hard negative samples in various ways. These studies…

Information Retrieval · Computer Science 2023-11-08 Fuyuan Lyu , Yaochen Hu , Xing Tang , Yingxue Zhang , Ruiming Tang , Xue Liu

Learning from Stochastic Labels

Annotating multi-class instances is a crucial task in the field of machine learning. Unfortunately, identifying the correct class label from a long sequence of candidate labels is time-consuming and laborious. To alleviate this problem, we…

Machine Learning · Computer Science 2025-12-05 Meng Wei , Zhongnian Li , Yong Zhou , Qiaoyu Guo , Xinzheng Xu

Suitability Filter: A Statistical Framework for Classifier Evaluation in Real-World Deployment Settings

Deploying machine learning models in safety-critical domains poses a key challenge: ensuring reliable model performance on downstream user data without access to ground truth labels for direct validation. We propose the suitability filter,…

Machine Learning · Computer Science 2025-05-29 Angéline Pouget , Mohammad Yaghini , Stephan Rabanser , Nicolas Papernot

Modeling and mitigating human annotation errors to design efficient stream processing systems with human-in-the-loop machine learning

High-quality human annotations are necessary for creating effective machine learning-driven stream processing systems. We study hybrid stream processing systems based on a Human-In-The-Loop Machine Learning (HITL-ML) paradigm, in which one…

Human-Computer Interaction · Computer Science 2022-01-19 Rahul Pandey , Hemant Purohit , Carlos Castillo , Valerie L. Shalin

Which is the best model for my data?

In this paper, we tackle the problem of selecting the optimal model for a given structured pattern classification dataset. In this context, a model can be understood as a classifier and a hyperparameter configuration. The proposed…

Machine Learning · Computer Science 2022-10-27 Gonzalo Nápoles , Isel Grau , Çiçek Güven , Orçun Özdemir , Yamisleydi Salgueiro

Improving Expert Predictions with Conformal Prediction

Automated decision support systems promise to help human experts solve multiclass classification tasks more efficiently and accurately. However, existing systems typically require experts to understand when to cede agency to the system or…

Machine Learning · Computer Science 2023-07-03 Eleni Straitouri , Lequn Wang , Nastaran Okati , Manuel Gomez Rodriguez

Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles

When a deep learning model is deployed in the wild, it can encounter test data drawn from distributions different from the training data distribution and suffer drop in performance. For safe deployment, it is essential to estimate the…

Machine Learning · Computer Science 2023-05-16 Jiefeng Chen , Frederick Liu , Besim Avci , Xi Wu , Yingyu Liang , Somesh Jha

Identifying Mislabeled Training Data

This paper presents a new approach to identifying and eliminating mislabeled training instances for supervised learning. The goal of this approach is to improve classification accuracies produced by learning algorithms by improving the…

Artificial Intelligence · Computer Science 2011-06-02 C. E. Brodley , M. A. Friedl

Auto-Annotation Quality Prediction for Semi-Supervised Learning with Ensembles

Auto-annotation by ensemble of models is an efficient method of learning on unlabeled data. Wrong or inaccurate annotations generated by the ensemble may lead to performance degradation of the trained model. To deal with this problem we…

Computer Vision and Pattern Recognition · Computer Science 2024-03-14 Dror Simon , Miriam Farber , Roman Goldenberg

Label-Descriptive Patterns and Their Application to Characterizing Classification Errors

State-of-the-art deep learning methods achieve human-like performance on many tasks, but make errors nevertheless. Characterizing these errors in easily interpretable terms gives insight into whether a classifier is prone to making…

Machine Learning · Computer Science 2022-06-20 Michael Hedderich , Jonas Fischer , Dietrich Klakow , Jilles Vreeken

CoSam: An Efficient Collaborative Adaptive Sampler for Recommendation

Sampling strategies have been widely applied in many recommendation systems to accelerate model learning from implicit feedback data. A typical strategy is to draw negative instances with uniform distribution, which however will severely…

Information Retrieval · Computer Science 2020-11-17 Jiawei Chen , Chengquan Jiang , Can Wang , Sheng Zhou , Yan Feng , Chun Chen , Martin Ester , Xiangnan He

A Review of Meta-level Learning in the Context of Multi-component, Multi-level Evolving Prediction Systems

The exponential growth of volume, variety and velocity of data is raising the need for investigations of automated or semi-automated ways to extract useful patterns from the data. It requires deep expert knowledge and extensive…

Machine Learning · Computer Science 2020-07-22 Abbas Raza Ali , Marcin Budka , Bogdan Gabrys

Explainable Model-specific Algorithm Selection for Multi-Label Classification

Multi-label classification (MLC) is an ML task of predictive modeling in which a data instance can simultaneously belong to multiple classes. MLC is increasingly gaining interest in different application domains such as text mining,…

Machine Learning · Computer Science 2022-11-22 Ana Kostovska , Carola Doerr , Sašo Džeroski , Dragi Kocev , Panče Panov , Tome Eftimov

Limitations of Assessing Active Learning Performance at Runtime

Classification algorithms aim to predict an unknown label (e.g., a quality class) for a new instance (e.g., a product). Therefore, training samples (instances and labels) are used to deduct classification hypotheses. Often, it is relatively…

Machine Learning · Computer Science 2019-01-30 Daniel Kottke , Jim Schellinger , Denis Huseljic , Bernhard Sick

Annotation Error Detection: Analyzing the Past and Present for a More Coherent Future

Annotated data is an essential ingredient in natural language processing for training and evaluating machine learning models. It is therefore very desirable for the annotations to be of high quality. Recent work, however, has shown that…

Computation and Language · Computer Science 2022-09-27 Jan-Christoph Klie , Bonnie Webber , Iryna Gurevych

Modeling Human Annotation Errors to Design Bias-Aware Systems for Social Stream Processing

High-quality human annotations are necessary to create effective machine learning systems for social media. Low-quality human annotations indirectly contribute to the creation of inaccurate or biased learning systems. We show that human…

Social and Information Networks · Computer Science 2019-07-18 Rahul Pandey , Carlos Castillo , Hemant Purohit

Learning Fast Matching Models from Weak Annotations

This paper proposes a novel training scheme for fast matching models in Search Ads, which is motivated by the real challenges in model training. The first challenge stems from the pursuit of high throughput, which prohibits the deployment…

Information Retrieval · Computer Science 2019-04-23 Xue Li , Zhipeng Luo , Hao Sun , Jianjin Zhang , Weihao Han , Xianqi Chu , Liangjie Zhang , Qi Zhang