English
Related papers

Related papers: Language Model-Driven Data Pruning Enables Efficie…

200 papers

Dataset pruning reduces the storage and training costs of deep learning by selecting an informative subset from a large dataset. However, most existing pruning methods require fully labeled data, which limits their applicability in…

Machine Learning · Computer Science 2026-05-25 Yeseul Cho , Baekrok Shin , Changmin Kang , Chulhee Yun

Active learning is a machine learning approach for reducing the data labeling effort. Given a pool of unlabeled samples, it tries to select the most useful ones to label so that a model built from them can achieve the best possible…

Machine Learning · Computer Science 2020-03-31 Dongrui Wu

While pre-trained language model (PLM) fine-tuning has achieved strong performance in many NLP tasks, the fine-tuning stage can be still demanding in labeled data. Recent works have resorted to active fine-tuning to improve the label…

Computation and Language · Computer Science 2022-05-04 Yue Yu , Lingkai Kong , Jieyu Zhang , Rongzhi Zhang , Chao Zhang

Process Reward Models (PRMs) provide step-level supervision to large language models (LLMs), but scaling up training data annotation remains challenging for both humans and LLMs. To address this limitation, we propose an active learning…

Machine Learning · Computer Science 2025-04-16 Keyu Duan , Zichen Liu , Xin Mao , Tianyu Pang , Changyu Chen , Qiguang Chen , Michael Qizhe Shieh , Longxu Dou

Active learning (AL) is a prominent technique for reducing the annotation effort required for training machine learning models. Deep learning offers a solution for several essential obstacles to deploying AL in practice but introduces many…

Computation and Language · Computer Science 2022-05-10 Akim Tsvigun , Artem Shelmanov , Gleb Kuzmin , Leonid Sanochkin , Daniil Larionov , Gleb Gusev , Manvel Avetisian , Leonid Zhukov

During recent years, active learning has evolved into a popular paradigm for utilizing user's feedback to improve accuracy of learning algorithms. Active learning works by selecting the most informative sample among unlabeled data and…

Machine Learning · Computer Science 2016-11-17 Alireza Ghasemi , Hamid R. Rabiee , Mohsen Fadaee , Mohammad T. Manzuri , Mohammad H. Rohban

Online active learning is a paradigm in machine learning that aims to select the most informative data points to label from a data stream. The problem of minimizing the cost associated with collecting labeled observations has gained a lot…

Machine Learning · Statistics 2023-12-01 Davide Cacciarelli , Murat Kulahci

The considerable size of Large Language Models (LLMs) presents notable deployment challenges, particularly on resource-constrained hardware. Structured pruning, offers an effective means to compress LLMs, thereby reducing storage costs and…

Computation and Language · Computer Science 2024-06-28 Shengrui Li , Junzhe Chen , Xueting Han , Jing Bai

In many real-world machine learning applications, unlabeled data can be easily obtained, but it is very time-consuming and/or expensive to label them. So, it is desirable to be able to select the optimal samples to label, so that a good…

Machine Learning · Computer Science 2020-01-16 Ziang Liu , Dongrui Wu

Active learning emerged as an alternative to alleviate the effort to label huge amount of data for data hungry applications (such as image/video indexing and retrieval, autonomous driving, etc.). The goal of active learning is to…

Computer Vision and Pattern Recognition · Computer Science 2020-10-20 Minghan Li , Xialei Liu , Joost van de Weijer , Bogdan Raducanu

Active learning aims to identify the most informative data from an unlabeled data pool that enables a model to reach the desired accuracy rapidly. This benefits especially deep neural networks which generally require a huge number of…

Computer Vision and Pattern Recognition · Computer Science 2023-03-28 Jihyo Kim , Jeonghyeon Kim , Sangheum Hwang

The cost of annotating transcriptions for large speech corpora becomes a bottleneck to maximally enjoy the potential capacity of deep neural network-based automatic speech recognition models. In this paper, we present a new training…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-06 Jihwan Bang , Heesu Kim , YoungJoon Yoo , Jung-Woo Ha

Active learning, a powerful paradigm in machine learning, aims at reducing labeling costs by selecting the most informative samples from an unlabeled dataset. However, the traditional active learning process often demands extensive…

Machine Learning · Computer Science 2024-01-17 Gábor Németh , Tamás Matuszka

Labeling data is one of the most costly processes in machine learning pipelines. Active learning is a standard approach to alleviating this problem. Pool-based active learning first builds a pool of unlabelled data and iteratively selects…

Machine Learning · Computer Science 2023-02-13 Ryoma Sato

Existing active learning (AL) strategies capture fundamentally different notions of data value, e.g., uncertainty or representativeness. Consequently, the effectiveness of strategies can vary substantially across datasets, models, and even…

Machine Learning · Computer Science 2026-03-27 Denis Huseljic , Marek Herde , Lukas Rauch , Paul Hahn , Bernhard Sick

Active Learning (AL) is a powerful tool for learning with less labeled data, in particular, for specialized domains, like legal documents, where unlabeled data is abundant, but the annotation requires domain expertise and is thus expensive.…

Computation and Language · Computer Science 2022-11-16 Sepideh Mamooler , Rémi Lebret , Stéphane Massonnet , Karl Aberer

Active learning (AL) combines data labeling and model training to minimize the labeling cost by prioritizing the selection of high value data that can best improve model performance. In pool-based active learning, accessible unlabeled data…

Machine Learning · Computer Science 2020-07-21 Mingfei Gao , Zizhao Zhang , Guo Yu , Sercan O. Arik , Larry S. Davis , Tomas Pfister

Active learning (AL) is a principled strategy to reduce annotation cost in data-hungry deep learning. However, existing AL algorithms focus almost exclusively on unimodal data, overlooking the substantial annotation burden in multimodal…

Machine Learning · Computer Science 2026-04-24 Jiancheng Zhang , Yinglun Zhu

The proliferation of automated data collection schemes and the advances in sensorics are increasing the amount of data we are able to monitor in real-time. However, given the high annotation costs and the time required by quality…

Machine Learning · Statistics 2023-07-17 Davide Cacciarelli , Murat Kulahci , John Sølve Tyssedal

Fine-tuning Large Language Models (LLMs) is now a common approach for text classification in a wide range of applications. When labeled documents are scarce, active learning helps save annotation efforts but requires retraining of massive…

Machine Learning · Computer Science 2024-02-27 Artem Vysogorets , Achintya Gopal
‹ Prev 1 2 3 10 Next ›