Related papers: Dataset Pruning: Reducing Training Data by Examini…

Large-scale Dataset Pruning with Dynamic Uncertainty

The state of the art of many learning tasks, e.g., image classification, is advanced by collecting larger datasets and then training larger models on them. As the outcome, the increasing computational cost is becoming unaffordable. In this…

Machine Learning · Computer Science 2024-06-17 Muyang He , Shuo Yang , Tiejun Huang , Bo Zhao

Lightweight Dataset Pruning without Full Training via Example Difficulty and Prediction Uncertainty

Recent advances in deep learning rely heavily on massive datasets, leading to substantial storage and training costs. Dataset pruning aims to alleviate this demand by discarding redundant examples. However, many existing methods require…

Machine Learning · Computer Science 2025-06-13 Yeseul Cho , Baekrok Shin , Changmin Kang , Chulhee Yun

UNSEEN: Enhancing Dataset Pruning from a Generalization Perspective

The growing scale of datasets in deep learning has introduced significant computational challenges. Dataset pruning addresses this challenge by constructing a compact but informative coreset from the full dataset with comparable…

Computer Vision and Pattern Recognition · Computer Science 2025-11-19 Furui Xu , Shaobo Wang , Jiajun Zhang , Chenghao Sun , Haixiang Tang , Linfeng Zhang

A Study in Dataset Pruning for Image Super-Resolution

In image Super-Resolution (SR), relying on large datasets for training is a double-edged sword. While offering rich training material, they also demand substantial computational and storage resources. In this work, we analyze dataset…

Image and Video Processing · Electrical Eng. & Systems 2024-06-11 Brian B. Moser , Federico Raue , Andreas Dengel

Impact of Data Pruning on Machine Learning Algorithm Performance

Dataset pruning is the process of removing sub-optimal tuples from a dataset to improve the learning of a machine learning model. In this paper, we compared the performance of different algorithms, first on an unpruned dataset and then on…

Machine Learning · Computer Science 2019-01-31 Arun Thundyill Saseendran , Lovish Setia , Viren Chhabria , Debrup Chakraborty , Aneek Barman Roy

Accelerating Deep Learning with Dynamic Data Pruning

Deep learning's success has been attributed to the training of large, overparameterized models on massive amounts of data. As this trend continues, model training has become prohibitively costly, requiring access to powerful computing…

Machine Learning · Computer Science 2021-11-25 Ravi S Raju , Kyle Daruwalla , Mikko Lipasti

Efficient Adversarial Training With Data Pruning

Neural networks are susceptible to adversarial examples-small input perturbations that cause models to fail. Adversarial training is one of the solutions that stops adversarial examples; models are exposed to attacks during training and…

Machine Learning · Computer Science 2022-07-05 Maximilian Kaufmann , Yiren Zhao , Ilia Shumailov , Robert Mullins , Nicolas Papernot

Effective Data Pruning through Score Extrapolation

Training advanced machine learning models demands massive datasets, resulting in prohibitive computational costs. To address this challenge, data pruning techniques identify and remove redundant training samples while preserving model…

Machine Learning · Computer Science 2025-06-23 Sebastian Schmidt , Prasanga Dhungel , Christoffer Löffler , Björn Nieth , Stephan Günnemann , Leo Schwinn

Data Pruning in Generative Diffusion Models

Data pruning is the problem of identifying a core subset that is most beneficial to training and discarding the remainder. While pruning strategies are well studied for discriminative models like those used in classification, little…

Machine Learning · Computer Science 2025-03-17 Rania Briq , Jiangtao Wang , Stefan Kesselheim

Selectivity Drives Productivity: Efficient Dataset Pruning for Enhanced Transfer Learning

Massive data is often considered essential for deep learning applications, but it also incurs significant computational and infrastructural costs. Therefore, dataset pruning (DP) has emerged as an effective way to improve data efficiency by…

Machine Learning · Computer Science 2023-11-21 Yihua Zhang , Yimeng Zhang , Aochuan Chen , Jinghan Jia , Jiancheng Liu , Gaowen Liu , Mingyi Hong , Shiyu Chang , Sijia Liu

Data Dropout: Optimizing Training Data for Convolutional Neural Networks

Deep learning models learn to fit training data while they are highly expected to generalize well to testing data. Most works aim at finding such models by creatively designing architectures and fine-tuning parameters. To adapt to…

Computer Vision and Pattern Recognition · Computer Science 2018-09-10 Tianyang Wang , Jun Huan , Bo Li

Label-Efficient Dataset Pruning via Semi-Supervised Pseudo-Labeling

Dataset pruning reduces the storage and training costs of deep learning by selecting an informative subset from a large dataset. However, most existing pruning methods require fully labeled data, which limits their applicability in…

Machine Learning · Computer Science 2026-05-25 Yeseul Cho , Baekrok Shin , Changmin Kang , Chulhee Yun

Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning

Dataset pruning aims to construct a coreset capable of achieving performance comparable to the original, full dataset. Most existing dataset pruning methods rely on snapshot-based criteria to identify representative samples, often resulting…

Computer Vision and Pattern Recognition · Computer Science 2024-05-29 Xin Zhang , Jiawei Du , Yunsong Li , Weiying Xie , Joey Tianyi Zhou

Pruning Before Training May Improve Generalization, Provably

It has been observed in practice that applying pruning-at-initialization methods to neural networks and training the sparsified networks can not only retain the testing performance of the original dense models, but also sometimes even…

Machine Learning · Computer Science 2023-01-31 Hongru Yang , Yingbin Liang , Xiaojie Guo , Lingfei Wu , Zhangyang Wang

Large-Scale Dataset Pruning in Adversarial Training through Data Importance Extrapolation

Their vulnerability to small, imperceptible attacks limits the adoption of deep learning models to real-world systems. Adversarial training has proven to be one of the most promising strategies against these attacks, at the expense of a…

Machine Learning · Computer Science 2024-07-12 Björn Nieth , Thomas Altstidl , Leo Schwinn , Björn Eskofier

Automatic Pruning of Fine-tuning Datasets for Transformer-based Language Models

Transformer-based language models have shown state-of-the-art performance on a variety of natural language understanding tasks. To achieve this performance, these models are first pre-trained on general corpus and then fine-tuned on…

Computation and Language · Computer Science 2024-07-15 Mohammadreza Tayaranian , Seyyed Hasan Mozafari , Brett H. Meyer , James J. Clark , Warren J. Gross

An Experimental Study of the Impact of Pre-training on the Pruning of a Convolutional Neural Network

In recent years, deep neural networks have known a wide success in various application domains. However, they require important computational and memory resources, which severely hinders their deployment, notably on mobile devices or for…

Computer Vision and Pattern Recognition · Computer Science 2021-12-16 Nathan Hubens , Matei Mancas , Bernard Gosselin , Marius Preda , Titus Zaharia

DRoP: Distributionally Robust Data Pruning

In the era of exceptionally data-hungry models, careful selection of the training data is essential to mitigate the extensive costs of deep learning. Data pruning offers a solution by removing redundant or uninformative samples from the…

Machine Learning · Computer Science 2025-02-11 Artem Vysogorets , Kartik Ahuja , Julia Kempe

Dataset Pruning in RecSys and ML: Best Practice or Mal-Practice?

Offline evaluations in recommender system research depend heavily on datasets, many of which are pruned, such as the widely used MovieLens collections. This thesis examines the impact of data pruning - specifically, removing users with…

Information Retrieval · Computer Science 2025-10-17 Leonie Winter

A Second-Order Perspective on Pruning at Initialization and Knowledge Transfer

The widespread availability of pre-trained vision models has enabled numerous deep learning applications through their transferable representations. However, their computational and storage costs often limit practical deployment.…

Computer Vision and Pattern Recognition · Computer Science 2025-09-30 Leonardo Iurada , Beatrice Occhiena , Tatiana Tommasi