Related papers: SwiftLearn: A Data-Efficient Training Method of De…

Not All Samples Are Created Equal: Deep Learning with Importance Sampling

Deep neural network training spends most of the computation on examples that are properly handled, and could be ignored. We propose to mitigate this phenomenon with a principled importance sampling scheme that focuses computation on…

Machine Learning · Computer Science 2019-10-29 Angelos Katharopoulos , François Fleuret

Swift Sampler: Efficient Learning of Sampler by 10 Parameters

Data selection is essential for training deep learning models. An effective data sampler assigns proper sampling probability for training data and helps the model converge to a good local minimum with high performance. Previous studies in…

Machine Learning · Computer Science 2024-10-10 Jiawei Yao , Chuming Li , Canran Xiao

DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing

Recent advances on deep learning models come at the price of formidable training cost. The increasing model size is one of the root causes, but another less-emphasized fact is that data scale is actually increasing at a similar speed as…

Machine Learning · Computer Science 2024-01-17 Conglong Li , Zhewei Yao , Xiaoxia Wu , Minjia Zhang , Connor Holmes , Cheng Li , Yuxiong He

Biased Importance Sampling for Deep Neural Network Training

Importance sampling has been successfully used to accelerate stochastic optimization in many convex problems. However, the lack of an efficient way to calculate the importance still hinders its application to Deep Learning. In this paper,…

Machine Learning · Computer Science 2017-09-14 Angelos Katharopoulos , François Fleuret

KAKURENBO: Adaptively Hiding Samples in Deep Neural Network Training

This paper proposes a method for hiding the least-important samples during the training of deep neural networks to increase efficiency, i.e., to reduce the cost of training. Using information about the loss and prediction confidence during…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-10-17 Truong Thao Nguyen , Balazs Gerofi , Edgar Josafat Martinez-Noriega , François Trahay , Mohamed Wahib

Variance Reduction in SGD by Distributed Importance Sampling

Humans are able to accelerate their learning by selecting training materials that are the most informative and at the appropriate level of difficulty. We propose a framework for distributing deep learning in which one set of workers search…

Machine Learning · Statistics 2016-04-19 Guillaume Alain , Alex Lamb , Chinnadhurai Sankar , Aaron Courville , Yoshua Bengio

Exploring Variance Reduction in Importance Sampling for Efficient DNN Training

Importance sampling is widely used to improve the efficiency of deep neural network (DNN) training by reducing the variance of gradient estimators. However, efficiently assessing the variance reduction relative to uniform sampling remains…

Machine Learning · Computer Science 2025-11-19 Takuro Kutsuna

Efficient NLP Model Finetuning via Multistage Data Filtering

As model finetuning is central to the modern NLP, we set to maximize its efficiency. Motivated by redundancy in training examples and the sheer sizes of pretrained models, we exploit a key opportunity: training only on important data. To…

Computation and Language · Computer Science 2023-05-22 Xu Ouyang , Shahina Mohd Azam Ansari , Felix Xiaozhu Lin , Yangfeng Ji

Accelerating Deep Learning with Dynamic Data Pruning

Deep learning's success has been attributed to the training of large, overparameterized models on massive amounts of data. As this trend continues, model training has become prohibitively costly, requiring access to powerful computing…

Machine Learning · Computer Science 2021-11-25 Ravi S Raju , Kyle Daruwalla , Mikko Lipasti

Dataset Distillation Meets Provable Subset Selection

Deep learning has grown tremendously over recent years, yielding state-of-the-art results in various fields. However, training such models requires huge amounts of data, increasing the computational time and cost. To address this, dataset…

Machine Learning · Computer Science 2023-07-18 Murad Tukan , Alaa Maalouf , Margarita Osadchy

Online Importance Sampling for Stochastic Gradient Optimization

Machine learning optimization often depends on stochastic gradient descent, where the precision of gradient estimation is vital for model performance. Gradients are calculated from mini-batches formed by uniformly selecting data samples…

Machine Learning · Computer Science 2025-01-29 Corentin Salaün , Xingchang Huang , Iliyan Georgiev , Niloy J. Mitra , Gurprit Singh

Smart Cuts: Enhance Active Learning for Vulnerability Detection by Pruning Hard-to-Learn Data

Vulnerability detection is crucial for identifying security weaknesses in software systems. However, training effective machine learning models for this task is often constrained by the high cost and expertise required for data annotation.…

Cryptography and Security · Computer Science 2025-08-19 Xiang Lan , Tim Menzies , Bowen Xu

Efficient Deep Representation Learning by Adaptive Latent Space Sampling

Supervised deep learning requires a large amount of training samples with annotations (e.g. label class for classification task, pixel- or voxel-wised label map for segmentation tasks), which are expensive and time-consuming to obtain.…

Computer Vision and Pattern Recognition · Computer Science 2020-04-14 Yuanhan Mo , Shuo Wang , Chengliang Dai , Rui Zhou , Zhongzhao Teng , Wenjia Bai , Yike Guo

Accelerating Deep Learning with Fixed Time Budget

The success of modern deep learning is attributed to two key elements: huge amounts of training data and large model sizes. Where a vast amount of data allows the model to learn more features, the large model architecture boosts the…

Machine Learning · Computer Science 2024-10-08 Muhammad Asif Khan , Ridha Hamila , Hamid Menouar

Self Regulated Learning Mechanism for Data Efficient Knowledge Distillation

Existing methods for distillation do not efficiently utilize the training data. This work presents a novel approach to perform distillation using only a subset of the training data, making it more data-efficient. For this purpose, the…

Machine Learning · Computer Science 2021-04-26 Sourav Mishra , Suresh Sundaram

Fast-DataShapley: Neural Modeling for Training Data Valuation

The value and copyright of training data are crucial in the artificial intelligence industry. Service platforms should protect data providers' legitimate rights and fairly reward them for their contributions. Shapley value, a potent tool…

Machine Learning · Computer Science 2025-11-21 Haifeng Sun , Yu Xiong , Runze Wu , Xinyu Cai , Changjie Fan , Lan Zhang , Xiang-Yang Li

SoftDedup: an Efficient Data Reweighting Method for Speeding Up Language Model Pre-training

The effectiveness of large language models (LLMs) is often hindered by duplicated data in their extensive pre-training datasets. Current approaches primarily focus on detecting and removing duplicates, which risks the loss of valuable…

Computation and Language · Computer Science 2024-07-10 Nan He , Weichen Xiong , Hanwen Liu , Yi Liao , Lei Ding , Kai Zhang , Guohua Tang , Xiao Han , Wei Yang

Accelerating Large Scale Knowledge Distillation via Dynamic Importance Sampling

Knowledge distillation is an effective technique that transfers knowledge from a large teacher model to a shallow student. However, just like massive classification, large scale knowledge distillation also imposes heavy computational costs…

Machine Learning · Computer Science 2018-12-04 Minghan Li , Tanli Zuo , Ruicheng Li , Martha White , Weishi Zheng

Selectivity Drives Productivity: Efficient Dataset Pruning for Enhanced Transfer Learning

Massive data is often considered essential for deep learning applications, but it also incurs significant computational and infrastructural costs. Therefore, dataset pruning (DP) has emerged as an effective way to improve data efficiency by…

Machine Learning · Computer Science 2023-11-21 Yihua Zhang , Yimeng Zhang , Aochuan Chen , Jinghan Jia , Jiancheng Liu , Gaowen Liu , Mingyi Hong , Shiyu Chang , Sijia Liu

AdaSelection: Accelerating Deep Learning Training through Data Subsampling

In this paper, we introduce AdaSelection, an adaptive sub-sampling method to identify the most informative sub-samples within each minibatch to speed up the training of large-scale deep learning models without sacrificing model performance.…

Machine Learning · Computer Science 2023-06-21 Minghe Zhang , Chaosheng Dong , Jinmiao Fu , Tianchen Zhou , Jia Liang , Jia Liu , Bo Liu , Michinari Momma , Bryan Wang , Yan Gao , Yi Sun