Related papers: Dictionary-Learning-Based Data Pruning for System …

Accelerating Deep Learning with Dynamic Data Pruning

Deep learning's success has been attributed to the training of large, overparameterized models on massive amounts of data. As this trend continues, model training has become prohibitively costly, requiring access to powerful computing…

Machine Learning · Computer Science 2021-11-25 Ravi S Raju , Kyle Daruwalla , Mikko Lipasti

Data Pruning Can Do More: A Comprehensive Data Pruning Approach for Object Re-identification

Previous studies have demonstrated that not each sample in a dataset is of equal importance during training. Data pruning aims to remove less important or informative samples while still achieving comparable results as training on the…

Computer Vision and Pattern Recognition · Computer Science 2024-12-16 Zi Yang , Haojin Yang , Soumajit Majumder , Jorge Cardoso , Guillermo Gallego

Multimodal-Guided Dynamic Dataset Pruning for Robust and Efficient Data-Centric Learning

Modern deep models are trained on large real-world datasets, where data quality varies and redundancy is common. Data-centric approaches such as dataset pruning have shown promise in improving training efficiency and model performance.…

Machine Learning · Computer Science 2025-07-18 Suorong Yang , Peijia Li , Yujie Liu , Zhiming Xu , Peng Ye , Wanli Ouyang , Furao Shen , Dongzhan Zhou

Language Model-Driven Data Pruning Enables Efficient Active Learning

Active learning (AL) optimizes data labeling efficiency by selecting the most informative instances for annotation. A key component in this procedure is an acquisition function that guides the selection process and identifies the suitable…

Machine Learning · Computer Science 2024-10-08 Abdul Hameed Azeemi , Ihsan Ayyub Qazi , Agha Ali Raza

Learning from Complexity: Exploring Dynamic Sample Pruning of Spatio-Temporal Training

Spatio-temporal forecasting is fundamental to intelligent systems in transportation, climate science, and urban planning. However, training deep learning models on the massive, often redundant, datasets from these domains presents a…

Machine Learning · Computer Science 2026-03-03 Wei Chen , Junle Chen , Yuqian Wu , Yuxuan Liang , Xiaofang Zhou

Exploring Learning Complexity for Efficient Downstream Dataset Pruning

The ever-increasing fine-tuning cost of large-scale pre-trained models gives rise to the importance of dataset pruning, which aims to reduce dataset size while maintaining task performance. However, existing dataset pruning methods require…

Machine Learning · Computer Science 2025-05-09 Wenyu Jiang , Zhenlong Liu , Zejian Xie , Songxin Zhang , Bingyi Jing , Hongxin Wei

Exploring Data Redundancy in Real-world Image Classification through Data Selection

Deep learning models often require large amounts of data for training, leading to increased costs. It is particularly challenging in medical imaging, i.e., gathering distributed data for centralized training, and meanwhile, obtaining…

Computer Vision and Pattern Recognition · Computer Science 2023-06-27 Zhenyu Tang , Shaoting Zhang , Xiaosong Wang

CLIP: Train Faster with Less Data

Deep learning models require an enormous amount of data for training. However, recently there is a shift in machine learning from model-centric to data-centric approaches. In data-centric approaches, the focus is to refine and improve the…

Computer Vision and Pattern Recognition · Computer Science 2023-08-08 Muhammad Asif Khan , Ridha Hamila , Hamid Menouar

AdaDeDup: Adaptive Hybrid Data Pruning for Efficient Large-Scale Object Detection Training

The computational burden and inherent redundancy of large-scale datasets challenge the training of contemporary machine learning models. Data pruning offers a solution by selecting smaller, informative subsets, yet existing methods…

Computer Vision and Pattern Recognition · Computer Science 2025-07-02 Feiyang Kang , Nadine Chang , Maying Shen , Marc T. Law , Rafid Mahmood , Ruoxi Jia , Jose M. Alvarez

RL-Selector: Reinforcement Learning-Guided Data Selection via Redundancy Assessment

Modern deep architectures often rely on large-scale datasets, but training on these datasets incurs high computational and storage overhead. Real-world datasets often contain substantial redundancies, prompting the need for more…

Machine Learning · Computer Science 2025-06-27 Suorong Yang , Peijia Li , Furao Shen , Jian Zhao

NLU on Data Diets: Dynamic Data Subset Selection for NLP Classification Tasks

Finetuning large language models inflates the costs of NLU applications and remains the bottleneck of development cycles. Recent works in computer vision use data pruning to reduce training time. Pruned data selection with static methods is…

Computation and Language · Computer Science 2023-06-07 Jean-Michel Attendu , Jean-Philippe Corbeil

TT-MPD: Test Time Model Pruning and Distillation

Pruning can be an effective method of compressing large pre-trained models for inference speed acceleration. Previous pruning approaches rely on access to the original training dataset for both pruning and subsequent fine-tuning. However,…

Computer Vision and Pattern Recognition · Computer Science 2024-12-11 Haihang Wu , Wei Wang , Tamasha Malepathirana , Sachith Seneviratne , Denny Oetomo , Saman Halgamuge

POCKET: Pruning Random Convolution Kernels for Time Series Classification from a Feature Selection Perspective

In recent years, two competitive time series classification models, namely, ROCKET and MINIROCKET, have garnered considerable attention due to their low training cost and high accuracy. However, they rely on a large number of random 1-D…

Machine Learning · Computer Science 2024-07-26 Shaowu Chen , Weize Sun , Lei Huang , Xiaopeng Li , Qingyuan Wang , Deepu John

Label-Efficient Dataset Pruning via Semi-Supervised Pseudo-Labeling

Dataset pruning reduces the storage and training costs of deep learning by selecting an informative subset from a large dataset. However, most existing pruning methods require fully labeled data, which limits their applicability in…

Machine Learning · Computer Science 2026-05-25 Yeseul Cho , Baekrok Shin , Changmin Kang , Chulhee Yun

Measuring Sample Importance in Data Pruning for Language Models based on Information Entropy

Compute-efficient training of language models has become an important issue. We consider data pruning for data-efficient training of LLMs. In this work, we consider a data pruning method based on information entropy. We propose that the…

Artificial Intelligence · Computer Science 2024-12-13 Minsang Kim , Seungjun Baek

Large-scale Dataset Pruning with Dynamic Uncertainty

The state of the art of many learning tasks, e.g., image classification, is advanced by collecting larger datasets and then training larger models on them. As the outcome, the increasing computational cost is becoming unaffordable. In this…

Machine Learning · Computer Science 2024-06-17 Muyang He , Shuo Yang , Tiejun Huang , Bo Zhao

Graph Pruning for Enumeration of Minimal Unsatisfiable Subsets

Finding Minimal Unsatisfiable Subsets (MUSes) of binary constraints is a common problem in infeasibility analysis of over-constrained systems. However, because of the exponential search space of the problem, enumerating MUSes is extremely…

Artificial Intelligence · Computer Science 2024-02-27 Panagiotis Lymperopoulos , Liping Liu

Neural Language Model Pruning for Automatic Speech Recognition

We study model pruning methods applied to Transformer-based neural network language models for automatic speech recognition. We explore three aspects of the pruning frame work, namely criterion, method and scheduler, analyzing their…

Machine Learning · Computer Science 2023-10-06 Leonardo Emili , Thiago Fraga-Silva , Ernest Pusateri , Markus Nußbaum-Thom , Youssef Oualil

Critical Learning Periods: Leveraging Early Training Dynamics for Efficient Data Pruning

Neural Machine Translation models are extremely data and compute-hungry. However, not all data points contribute equally to model training and generalization. Data pruning to remove the low-value data points has the benefit of drastically…

Computation and Language · Computer Science 2024-06-24 Everlyn Asiko Chimoto , Jay Gala , Orevaoghene Ahia , Julia Kreutzer , Bruce A. Bassett , Sara Hooker

Pruning variable selection ensembles

In the context of variable selection, ensemble learning has gained increasing interest due to its great potential to improve selection accuracy and to reduce false discovery rate. A novel ordering-based selective ensemble learning strategy…

Machine Learning · Statistics 2017-04-28 Chunxia Zhang , Yilei Wu , Mu Zhu