Related papers: Dictionary-Learning-Based Data Pruning for System …
Deep learning's success has been attributed to the training of large, overparameterized models on massive amounts of data. As this trend continues, model training has become prohibitively costly, requiring access to powerful computing…
Previous studies have demonstrated that not each sample in a dataset is of equal importance during training. Data pruning aims to remove less important or informative samples while still achieving comparable results as training on the…
Modern deep models are trained on large real-world datasets, where data quality varies and redundancy is common. Data-centric approaches such as dataset pruning have shown promise in improving training efficiency and model performance.…
Active learning (AL) optimizes data labeling efficiency by selecting the most informative instances for annotation. A key component in this procedure is an acquisition function that guides the selection process and identifies the suitable…
Spatio-temporal forecasting is fundamental to intelligent systems in transportation, climate science, and urban planning. However, training deep learning models on the massive, often redundant, datasets from these domains presents a…
The ever-increasing fine-tuning cost of large-scale pre-trained models gives rise to the importance of dataset pruning, which aims to reduce dataset size while maintaining task performance. However, existing dataset pruning methods require…
Deep learning models often require large amounts of data for training, leading to increased costs. It is particularly challenging in medical imaging, i.e., gathering distributed data for centralized training, and meanwhile, obtaining…
Deep learning models require an enormous amount of data for training. However, recently there is a shift in machine learning from model-centric to data-centric approaches. In data-centric approaches, the focus is to refine and improve the…
The computational burden and inherent redundancy of large-scale datasets challenge the training of contemporary machine learning models. Data pruning offers a solution by selecting smaller, informative subsets, yet existing methods…
Modern deep architectures often rely on large-scale datasets, but training on these datasets incurs high computational and storage overhead. Real-world datasets often contain substantial redundancies, prompting the need for more…
Finetuning large language models inflates the costs of NLU applications and remains the bottleneck of development cycles. Recent works in computer vision use data pruning to reduce training time. Pruned data selection with static methods is…
Pruning can be an effective method of compressing large pre-trained models for inference speed acceleration. Previous pruning approaches rely on access to the original training dataset for both pruning and subsequent fine-tuning. However,…
In recent years, two competitive time series classification models, namely, ROCKET and MINIROCKET, have garnered considerable attention due to their low training cost and high accuracy. However, they rely on a large number of random 1-D…
Dataset pruning reduces the storage and training costs of deep learning by selecting an informative subset from a large dataset. However, most existing pruning methods require fully labeled data, which limits their applicability in…
Compute-efficient training of language models has become an important issue. We consider data pruning for data-efficient training of LLMs. In this work, we consider a data pruning method based on information entropy. We propose that the…
The state of the art of many learning tasks, e.g., image classification, is advanced by collecting larger datasets and then training larger models on them. As the outcome, the increasing computational cost is becoming unaffordable. In this…
Finding Minimal Unsatisfiable Subsets (MUSes) of binary constraints is a common problem in infeasibility analysis of over-constrained systems. However, because of the exponential search space of the problem, enumerating MUSes is extremely…
We study model pruning methods applied to Transformer-based neural network language models for automatic speech recognition. We explore three aspects of the pruning frame work, namely criterion, method and scheduler, analyzing their…
Neural Machine Translation models are extremely data and compute-hungry. However, not all data points contribute equally to model training and generalization. Data pruning to remove the low-value data points has the benefit of drastically…
In the context of variable selection, ensemble learning has gained increasing interest due to its great potential to improve selection accuracy and to reduce false discovery rate. A novel ordering-based selective ensemble learning strategy…