Related papers: Scale Efficient Training for Large Datasets

Lightweight Dataset Pruning without Full Training via Example Difficulty and Prediction Uncertainty

Recent advances in deep learning rely heavily on massive datasets, leading to substantial storage and training costs. Dataset pruning aims to alleviate this demand by discarding redundant examples. However, many existing methods require…

Machine Learning · Computer Science 2025-06-13 Yeseul Cho , Baekrok Shin , Changmin Kang , Chulhee Yun

Accelerating Deep Learning with Dynamic Data Pruning

Deep learning's success has been attributed to the training of large, overparameterized models on massive amounts of data. As this trend continues, model training has become prohibitively costly, requiring access to powerful computing…

Machine Learning · Computer Science 2021-11-25 Ravi S Raju , Kyle Daruwalla , Mikko Lipasti

Learning from Complexity: Exploring Dynamic Sample Pruning of Spatio-Temporal Training

Spatio-temporal forecasting is fundamental to intelligent systems in transportation, climate science, and urban planning. However, training deep learning models on the massive, often redundant, datasets from these domains presents a…

Machine Learning · Computer Science 2026-03-03 Wei Chen , Junle Chen , Yuqian Wu , Yuxuan Liang , Xiaofang Zhou

Effective Data Pruning through Score Extrapolation

Training advanced machine learning models demands massive datasets, resulting in prohibitive computational costs. To address this challenge, data pruning techniques identify and remove redundant training samples while preserving model…

Machine Learning · Computer Science 2025-06-23 Sebastian Schmidt , Prasanga Dhungel , Christoffer Löffler , Björn Nieth , Stephan Günnemann , Leo Schwinn

BLoad: Enhancing Neural Network Training with Efficient Sequential Data Handling

The increasing complexity of modern deep neural network models and the expanding sizes of datasets necessitate the development of optimized and scalable training methods. In this white paper, we addressed the challenge of efficiently…

Machine Learning · Computer Science 2024-04-29 Raphael Ruschel , A. S. M. Iftekhar , B. S. Manjunath , Suya You

Large-scale Dataset Pruning with Dynamic Uncertainty

The state of the art of many learning tasks, e.g., image classification, is advanced by collecting larger datasets and then training larger models on them. As the outcome, the increasing computational cost is becoming unaffordable. In this…

Machine Learning · Computer Science 2024-06-17 Muyang He , Shuo Yang , Tiejun Huang , Bo Zhao

UNSEEN: Enhancing Dataset Pruning from a Generalization Perspective

The growing scale of datasets in deep learning has introduced significant computational challenges. Dataset pruning addresses this challenge by constructing a compact but informative coreset from the full dataset with comparable…

Computer Vision and Pattern Recognition · Computer Science 2025-11-19 Furui Xu , Shaobo Wang , Jiajun Zhang , Chenghao Sun , Haixiang Tang , Linfeng Zhang

A Study in Dataset Pruning for Image Super-Resolution

In image Super-Resolution (SR), relying on large datasets for training is a double-edged sword. While offering rich training material, they also demand substantial computational and storage resources. In this work, we analyze dataset…

Image and Video Processing · Electrical Eng. & Systems 2024-06-11 Brian B. Moser , Federico Raue , Andreas Dengel

Multimodal-Guided Dynamic Dataset Pruning for Robust and Efficient Data-Centric Learning

Modern deep models are trained on large real-world datasets, where data quality varies and redundancy is common. Data-centric approaches such as dataset pruning have shown promise in improving training efficiency and model performance.…

Machine Learning · Computer Science 2025-07-18 Suorong Yang , Peijia Li , Yujie Liu , Zhiming Xu , Peng Ye , Wanli Ouyang , Furao Shen , Dongzhan Zhou

Automatic Joint Structured Pruning and Quantization for Efficient Neural Network Training and Compression

Structured pruning and quantization are fundamental techniques used to reduce the size of deep neural networks (DNNs) and typically are applied independently. Applying these techniques jointly via co-optimization has the potential to…

Machine Learning · Computer Science 2025-02-25 Xiaoyi Qu , David Aponte , Colby Banbury , Daniel P. Robinson , Tianyu Ding , Kazuhito Koishida , Ilya Zharkov , Tianyi Chen

Swift Cross-Dataset Pruning: Enhancing Fine-Tuning Efficiency in Natural Language Understanding

Dataset pruning aims to select a subset of a dataset for efficient model training. While data efficiency in natural language processing has primarily focused on within-corpus scenarios during model pre-training, efficient dataset pruning…

Computation and Language · Computer Science 2025-01-07 Binh-Nguyen Nguyen , Yang He

Improving the Scaling Laws of Synthetic Data with Deliberate Practice

Inspired by the principle of deliberate practice in human learning, we propose Deliberate Practice for Synthetic Data Generation (DP), a novel framework that improves sample efficiency through dynamic synthetic data generation. Prior work…

Machine Learning · Computer Science 2025-02-24 Reyhane Askari-Hemmat , Mohammad Pezeshki , Elvis Dohmatob , Florian Bordes , Pietro Astolfi , Melissa Hall , Jakob Verbeek , Michal Drozdzal , Adriana Romero-Soriano

Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning

Dataset pruning aims to construct a coreset capable of achieving performance comparable to the original, full dataset. Most existing dataset pruning methods rely on snapshot-based criteria to identify representative samples, often resulting…

Computer Vision and Pattern Recognition · Computer Science 2024-05-29 Xin Zhang , Jiawei Du , Yunsong Li , Weiying Xie , Joey Tianyi Zhou

Dataset Pruning: Reducing Training Data by Examining Generalization Influence

The great success of deep learning heavily relies on increasingly larger training data, which comes at a price of huge computational and infrastructural costs. This poses crucial questions that, do all training data contribute to model's…

Machine Learning · Computer Science 2023-02-28 Shuo Yang , Zeke Xie , Hanyu Peng , Min Xu , Mingming Sun , Ping Li

FETTA: Flexible and Efficient Hardware Accelerator for Tensorized Neural Network Training

The increasing demand for on-device training of deep neural networks (DNNs) aims to leverage personal data for high-performance applications while addressing privacy concerns and reducing communication latency. However, resource-constrained…

Hardware Architecture · Computer Science 2026-03-31 Jinming Lu , Jiayi Tian , Hai Li , Ian Young , Zheng Zhang

Selectivity Drives Productivity: Efficient Dataset Pruning for Enhanced Transfer Learning

Massive data is often considered essential for deep learning applications, but it also incurs significant computational and infrastructural costs. Therefore, dataset pruning (DP) has emerged as an effective way to improve data efficiency by…

Machine Learning · Computer Science 2023-11-21 Yihua Zhang , Yimeng Zhang , Aochuan Chen , Jinghan Jia , Jiancheng Liu , Gaowen Liu , Mingyi Hong , Shiyu Chang , Sijia Liu

CLIP: Train Faster with Less Data

Deep learning models require an enormous amount of data for training. However, recently there is a shift in machine learning from model-centric to data-centric approaches. In data-centric approaches, the focus is to refine and improve the…

Computer Vision and Pattern Recognition · Computer Science 2023-08-08 Muhammad Asif Khan , Ridha Hamila , Hamid Menouar

Dataset Condensation with Gradient Matching

As the state-of-the-art machine learning methods in many fields rely on larger datasets, storing datasets and training models on them become significantly more expensive. This paper proposes a training set synthesis technique for…

Computer Vision and Pattern Recognition · Computer Science 2021-03-09 Bo Zhao , Konda Reddy Mopuri , Hakan Bilen

Training Efficient CNNS: Tweaking the Nuts and Bolts of Neural Networks for Lighter, Faster and Robust Models

Deep Learning has revolutionized the fields of computer vision, natural language understanding, speech recognition, information retrieval and more. Many techniques have evolved over the past decade that made models lighter, faster, and…

Machine Learning · Computer Science 2022-05-25 Sabeesh Ethiraj , Bharath Kumar Bolla

Rethinking Data Selection at Scale: Random Selection is Almost All You Need

Supervised fine-tuning (SFT) is crucial for aligning Large Language Models (LLMs) with human instructions. The primary goal during SFT is to select a small yet representative subset of training data from the larger pool, such that…

Computation and Language · Computer Science 2024-12-10 Tingyu Xia , Bowen Yu , Kai Dang , An Yang , Yuan Wu , Yuan Tian , Yi Chang , Junyang Lin