Related papers: Language Model-Driven Data Pruning Enables Efficie…

Label-Efficient Dataset Pruning via Semi-Supervised Pseudo-Labeling

Dataset pruning reduces the storage and training costs of deep learning by selecting an informative subset from a large dataset. However, most existing pruning methods require fully labeled data, which limits their applicability in…

Machine Learning · Computer Science 2026-05-25 Yeseul Cho , Baekrok Shin , Changmin Kang , Chulhee Yun

Pool-Based Sequential Active Learning for Regression

Active learning is a machine learning approach for reducing the data labeling effort. Given a pool of unlabeled samples, it tries to select the most useful ones to label so that a model built from them can achieve the best possible…

Machine Learning · Computer Science 2020-03-31 Dongrui Wu

AcTune: Uncertainty-aware Active Self-Training for Semi-Supervised Active Learning with Pretrained Language Models

While pre-trained language model (PLM) fine-tuning has achieved strong performance in many NLP tasks, the fine-tuning stage can be still demanding in labeled data. Recent works have resorted to active fine-tuning to improve the label…

Computation and Language · Computer Science 2022-05-04 Yue Yu , Lingkai Kong , Jieyu Zhang , Rongzhi Zhang , Chao Zhang

Efficient Process Reward Model Training via Active Learning

Process Reward Models (PRMs) provide step-level supervision to large language models (LLMs), but scaling up training data annotation remains challenging for both humans and LLMs. To address this limitation, we propose an active learning…

Machine Learning · Computer Science 2025-04-16 Keyu Duan , Zichen Liu , Xin Mao , Tianyu Pang , Changyu Chen , Qiguang Chen , Michael Qizhe Shieh , Longxu Dou

Towards Computationally Feasible Deep Active Learning

Active learning (AL) is a prominent technique for reducing the annotation effort required for training machine learning models. Deep learning offers a solution for several essential obstacles to deploying AL in practice but introduces many…

Computation and Language · Computer Science 2022-05-10 Akim Tsvigun , Artem Shelmanov , Gleb Kuzmin , Leonid Sanochkin , Daniil Larionov , Gleb Gusev , Manvel Avetisian , Leonid Zhukov

Active Learning from Positive and Unlabeled Data

During recent years, active learning has evolved into a popular paradigm for utilizing user's feedback to improve accuracy of learning algorithms. Active learning works by selecting the most informative sample among unlabeled data and…

Machine Learning · Computer Science 2016-11-17 Alireza Ghasemi , Hamid R. Rabiee , Mohsen Fadaee , Mohammad T. Manzuri , Mohammad H. Rohban

Active learning for data streams: a survey

Online active learning is a paradigm in machine learning that aims to select the most informative data points to label from a data stream. The problem of minimizing the cost associated with collecting labeled observations has gained a lot…

Machine Learning · Statistics 2023-12-01 Davide Cacciarelli , Murat Kulahci

NutePrune: Efficient Progressive Pruning with Numerous Teachers for Large Language Models

The considerable size of Large Language Models (LLMs) presents notable deployment challenges, particularly on resource-constrained hardware. Structured pruning, offers an effective means to compress LLMs, thereby reducing storage costs and…

Computation and Language · Computer Science 2024-06-28 Shengrui Li , Junzhe Chen , Xueting Han , Jing Bai

Unsupervised Pool-Based Active Learning for Linear Regression

In many real-world machine learning applications, unlabeled data can be easily obtained, but it is very time-consuming and/or expensive to label them. So, it is desirable to be able to select the optimal samples to label, so that a good…

Machine Learning · Computer Science 2020-01-16 Ziang Liu , Dongrui Wu

Learning to Rank for Active Learning: A Listwise Approach

Active learning emerged as an alternative to alleviate the effort to label huge amount of data for data hungry applications (such as image/video indexing and retrieval, autonomous driving, etc.). The goal of active learning is to…

Computer Vision and Pattern Recognition · Computer Science 2020-10-20 Minghan Li , Xialei Liu , Joost van de Weijer , Bogdan Raducanu

Deep Active Learning with Contrastive Learning Under Realistic Data Pool Assumptions

Active learning aims to identify the most informative data from an unlabeled data pool that enables a model to reach the desired accuracy rapidly. This benefits especially deep neural networks which generally require a huge number of…

Computer Vision and Pattern Recognition · Computer Science 2023-03-28 Jihyo Kim , Jeonghyeon Kim , Sangheum Hwang

Boosting Active Learning for Speech Recognition with Noisy Pseudo-labeled Samples

The cost of annotating transcriptions for large speech corpora becomes a bottleneck to maximally enjoy the potential capacity of deep neural network-based automatic speech recognition models. In this paper, we present a new training…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-06 Jihwan Bang , Heesu Kim , YoungJoon Yoo , Jung-Woo Ha

Compute-Efficient Active Learning

Active learning, a powerful paradigm in machine learning, aims at reducing labeling costs by selecting the most informative samples from an unlabeled dataset. However, the traditional active learning process often demands extensive…

Machine Learning · Computer Science 2024-01-17 Gábor Németh , Tamás Matuszka

Active Learning from the Web

Labeling data is one of the most costly processes in machine learning pipelines. Active learning is a standard approach to alleviating this problem. Pool-based active learning first builds a pool of unlabelled data and iteratively selects…

Machine Learning · Computer Science 2023-02-13 Ryoma Sato

Cleaning the Pool: Progressive Filtering of Unlabeled Pools in Deep Active Learning

Existing active learning (AL) strategies capture fundamentally different notions of data value, e.g., uncertainty or representativeness. Consequently, the effectiveness of strategies can vary substantially across datasets, models, and even…

Machine Learning · Computer Science 2026-03-27 Denis Huseljic , Marek Herde , Lukas Rauch , Paul Hahn , Bernhard Sick

An Efficient Active Learning Pipeline for Legal Text Classification

Active Learning (AL) is a powerful tool for learning with less labeled data, in particular, for specialized domains, like legal documents, where unlabeled data is abundant, but the annotation requires domain expertise and is thus expensive.…

Computation and Language · Computer Science 2022-11-16 Sepideh Mamooler , Rémi Lebret , Stéphane Massonnet , Karl Aberer

Consistency-based Semi-supervised Active Learning: Towards Minimizing Labeling Cost

Active learning (AL) combines data labeling and model training to minimize the labeling cost by prioritizing the selection of high value data that can best improve model performance. In pool-based active learning, accessible unlabeled data…

Machine Learning · Computer Science 2020-07-21 Mingfei Gao , Zizhao Zhang , Guo Yu , Sercan O. Arik , Larry S. Davis , Tomas Pfister

Towards Multimodal Active Learning: Efficient Learning with Limited Paired Data

Active learning (AL) is a principled strategy to reduce annotation cost in data-hungry deep learning. However, existing AL algorithms focus almost exclusively on unimodal data, overlooking the substantial annotation burden in multimodal…

Machine Learning · Computer Science 2026-04-24 Jiancheng Zhang , Yinglun Zhu

Stream-based active learning with linear models

The proliferation of automated data collection schemes and the advances in sensorics are increasing the amount of data we are able to monitor in real-time. However, given the high annotation costs and the time required by quality…

Machine Learning · Statistics 2023-07-17 Davide Cacciarelli , Murat Kulahci , John Sølve Tyssedal

Towards Efficient Active Learning in NLP via Pretrained Representations

Fine-tuning Large Language Models (LLMs) is now a common approach for text classification in a wide range of applications. When labeled documents are scarce, active learning helps save annotation efforts but requires retraining of massive…

Machine Learning · Computer Science 2024-02-27 Artem Vysogorets , Achintya Gopal