Related papers: ED2: Two-stage Active Learning for Error Detection…

Hierarchical Subquery Evaluation for Active Learning on a Graph

To train good supervised and semi-supervised object classifiers, it is critical that we not waste the time of the human experts who are providing the training labels. Existing active learning strategies can have uneven performance, being…

Computer Vision and Pattern Recognition · Computer Science 2015-05-01 Oisin Mac Aodha , Neill D. F. Campbell , Jan Kautz , Gabriel J. Brostow

Gradual Machine Learning for Entity Resolution

Usually considered as a classification problem, entity resolution (ER) can be very challenging on real data due to the prevalence of dirty values. The state-of-the-art solutions for ER were built on a variety of learning models (most…

Databases · Computer Science 2019-06-17 Boyi Hou , Qun Chen , Yanyan Wang , Youcef Nafa , Zhanhuai Li

Diffusion-Driven Two-Stage Active Learning for Low-Budget Semantic Segmentation

Semantic segmentation demands dense pixel-level annotations, which can be prohibitively expensive - especially under extremely constrained labeling budgets. In this paper, we address the problem of low-budget active learning for semantic…

Computer Vision and Pattern Recognition · Computer Science 2025-10-28 Jeongin Kim , Wonho Bae , YouLee Han , Giyeong Oh , Youngjae Yu , Danica J. Sutherland , Junhyug Noh

Identifying Wrongly Predicted Samples: A Method for Active Learning

State-of-the-art machine learning models require access to significant amount of annotated data in order to achieve the desired level of performance. While unlabelled data can be largely available and even abundant, annotation process can…

Machine Learning · Computer Science 2020-10-15 Rahaf Aljundi , Nikolay Chumerin , Daniel Olmeda Reino

A Comprehensive Benchmark Framework for Active Learning Methods in Entity Matching

Entity Matching (EM) is a core data cleaning task, aiming to identify different mentions of the same real-world entity. Active learning is one way to address the challenge of scarce labeled data in practice, by dynamically collecting the…

Databases · Computer Science 2020-03-31 Venkata Vamsikrishna Meduri , Lucian Popa , Prithviraj Sen , Mohamed Sarwat

Active Learning with Expected Error Reduction

Active learning has been studied extensively as a method for efficient data collection. Among the many approaches in literature, Expected Error Reduction (EER) (Roy and McCallum) has been shown to be an effective method for active learning:…

Machine Learning · Computer Science 2022-11-18 Stephen Mussmann , Julia Reisler , Daniel Tsai , Ehsan Mousavi , Shayne O'Brien , Moises Goldszmidt

Parting with Illusions about Deep Active Learning

Active learning aims to reduce the high labeling cost involved in training machine learning models on large datasets by efficiently labeling only the most informative samples. Recently, deep active learning has shown success on various…

Computer Vision and Pattern Recognition · Computer Science 2019-12-12 Sudhanshu Mittal , Maxim Tatarchenko , Özgün Çiçek , Thomas Brox

ZeroED: Hybrid Zero-shot Error Detection through Large Language Model Reasoning

Error detection (ED) in tabular data is crucial yet challenging due to diverse error types and the need for contextual understanding. Traditional ED methods often rely heavily on manual criteria and labels, making them labor-intensive.…

Machine Learning · Computer Science 2025-04-09 Wei Ni , Kaihang Zhang , Xiaoye Miao , Xiangyu Zhao , Yangyang Wu , Yaoshu Wang , Jianwei Yin

Experience-Enhanced Learning: One Size Still does not Fit All in Automatic Database

Recent years, the database committee has attempted to develop automatic database management systems. Although some researches show that the applying AI to data management is a significant and promising direction, there still exists many…

Databases · Computer Science 2021-11-23 Yu Yan , Hongzhi Wang , Jian Ma , Jian Geng , Yuzhuo Wang

Active Learning: Problem Settings and Recent Developments

In supervised learning, acquiring labeled training data for a predictive model can be very costly, but acquiring a large amount of unlabeled data is often quite easy. Active learning is a method of obtaining predictive models with high…

Machine Learning · Computer Science 2020-12-17 Hideitsu Hino

Annotation Error Detection: Analyzing the Past and Present for a More Coherent Future

Annotated data is an essential ingredient in natural language processing for training and evaluating machine learning models. It is therefore very desirable for the annotations to be of high quality. Recent work, however, has shown that…

Computation and Language · Computer Science 2022-09-27 Jan-Christoph Klie , Bonnie Webber , Iryna Gurevych

Semantic Segmentation with Active Semi-Supervised Learning

Using deep learning, we now have the ability to create exceptionally good semantic segmentation systems; however, collecting the prerequisite pixel-wise annotations for training images remains expensive and time-consuming. Therefore, it…

Computer Vision and Pattern Recognition · Computer Science 2022-10-19 Aneesh Rangnekar , Christopher Kanan , Matthew Hoffman

Adaptive Label Error Detection: A Bayesian Approach to Mislabeled Data Detection

Machine learning classification systems are susceptible to poor performance when trained with incorrect ground truth labels, even when data is well-curated by expert annotators. As machine learning becomes more widespread, it is…

Machine Learning · Computer Science 2026-01-16 Zan Chaudhry , Noam H. Rotenberg , Brian Caffo , Craig K. Jones , Haris I. Sair

Practical Edge Detection via Robust Collaborative Learning

Edge detection, as a core component in a wide range of visionoriented tasks, is to identify object boundaries and prominent edges in natural images. An edge detector is desired to be both efficient and accurate for practical use. To achieve…

Computer Vision and Pattern Recognition · Computer Science 2023-08-29 Yuanbin Fu , Xiaojie Guo

Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles

When a deep learning model is deployed in the wild, it can encounter test data drawn from distributions different from the training data distribution and suffer drop in performance. For safe deployment, it is essential to estimate the…

Machine Learning · Computer Science 2023-05-16 Jiefeng Chen , Frederick Liu , Besim Avci , Xi Wu , Yingyu Liang , Somesh Jha

Limitations of Assessing Active Learning Performance at Runtime

Classification algorithms aim to predict an unknown label (e.g., a quality class) for a new instance (e.g., a product). Therefore, training samples (instances and labels) are used to deduct classification hypotheses. Often, it is relatively…

Machine Learning · Computer Science 2019-01-30 Daniel Kottke , Jim Schellinger , Denis Huseljic , Bernhard Sick

Active clustering for labeling training data

Gathering training data is a key step of any supervised learning task, and it is both critical and expensive. Critical, because the quantity and quality of the training data has a high impact on the performance of the learned function.…

Data Structures and Algorithms · Computer Science 2021-10-28 Quentin Lutz , Élie de Panafieu , Alex Scott , Maya Stein

ADSEL: Adaptive dual self-expression learning for EEG feature selection via incomplete multi-dimensional emotional tagging

EEG based multi-dimension emotion recognition has attracted substantial research interest in human computer interfaces. However, the high dimensionality of EEG features, coupled with limited sample sizes, frequently leads to classifier…

Human-Computer Interaction · Computer Science 2025-08-08 Tianze Yu , Junming Zhang , Wenjia Dong , Xueyuan Xu , Li Zhuo

Active$^2$ Learning: Actively reducing redundancies in Active Learning methods for Sequence Tagging and Machine Translation

While deep learning is a powerful tool for natural language processing (NLP) problems, successful solutions to these problems rely heavily on large amounts of annotated samples. However, manually annotating data is expensive and…

Computation and Language · Computer Science 2021-04-06 Rishi Hazra , Parag Dutta , Shubham Gupta , Mohammed Abdul Qaathir , Ambedkar Dukkipati

Learning From Less Data: Diversified Subset Selection and Active Learning in Image Classification Tasks

Supervised machine learning based state-of-the-art computer vision techniques are in general data hungry and pose the challenges of not having adequate computing resources and of high costs involved in human labeling efforts. Training data…

Computer Vision and Pattern Recognition · Computer Science 2018-05-30 Vishal Kaushal , Anurag Sahoo , Khoshrav Doctor , Narasimha Raju , Suyash Shetty , Pankaj Singh , Rishabh Iyer , Ganesh Ramakrishnan