English
Related papers

Related papers: Scale Efficient Training for Large Datasets

200 papers

Recent advances in deep learning rely heavily on massive datasets, leading to substantial storage and training costs. Dataset pruning aims to alleviate this demand by discarding redundant examples. However, many existing methods require…

Machine Learning · Computer Science 2025-06-13 Yeseul Cho , Baekrok Shin , Changmin Kang , Chulhee Yun

Deep learning's success has been attributed to the training of large, overparameterized models on massive amounts of data. As this trend continues, model training has become prohibitively costly, requiring access to powerful computing…

Machine Learning · Computer Science 2021-11-25 Ravi S Raju , Kyle Daruwalla , Mikko Lipasti

Spatio-temporal forecasting is fundamental to intelligent systems in transportation, climate science, and urban planning. However, training deep learning models on the massive, often redundant, datasets from these domains presents a…

Machine Learning · Computer Science 2026-03-03 Wei Chen , Junle Chen , Yuqian Wu , Yuxuan Liang , Xiaofang Zhou

Training advanced machine learning models demands massive datasets, resulting in prohibitive computational costs. To address this challenge, data pruning techniques identify and remove redundant training samples while preserving model…

Machine Learning · Computer Science 2025-06-23 Sebastian Schmidt , Prasanga Dhungel , Christoffer Löffler , Björn Nieth , Stephan Günnemann , Leo Schwinn

The increasing complexity of modern deep neural network models and the expanding sizes of datasets necessitate the development of optimized and scalable training methods. In this white paper, we addressed the challenge of efficiently…

Machine Learning · Computer Science 2024-04-29 Raphael Ruschel , A. S. M. Iftekhar , B. S. Manjunath , Suya You

The state of the art of many learning tasks, e.g., image classification, is advanced by collecting larger datasets and then training larger models on them. As the outcome, the increasing computational cost is becoming unaffordable. In this…

Machine Learning · Computer Science 2024-06-17 Muyang He , Shuo Yang , Tiejun Huang , Bo Zhao

The growing scale of datasets in deep learning has introduced significant computational challenges. Dataset pruning addresses this challenge by constructing a compact but informative coreset from the full dataset with comparable…

Computer Vision and Pattern Recognition · Computer Science 2025-11-19 Furui Xu , Shaobo Wang , Jiajun Zhang , Chenghao Sun , Haixiang Tang , Linfeng Zhang

In image Super-Resolution (SR), relying on large datasets for training is a double-edged sword. While offering rich training material, they also demand substantial computational and storage resources. In this work, we analyze dataset…

Image and Video Processing · Electrical Eng. & Systems 2024-06-11 Brian B. Moser , Federico Raue , Andreas Dengel

Modern deep models are trained on large real-world datasets, where data quality varies and redundancy is common. Data-centric approaches such as dataset pruning have shown promise in improving training efficiency and model performance.…

Machine Learning · Computer Science 2025-07-18 Suorong Yang , Peijia Li , Yujie Liu , Zhiming Xu , Peng Ye , Wanli Ouyang , Furao Shen , Dongzhan Zhou

Structured pruning and quantization are fundamental techniques used to reduce the size of deep neural networks (DNNs) and typically are applied independently. Applying these techniques jointly via co-optimization has the potential to…

Machine Learning · Computer Science 2025-02-25 Xiaoyi Qu , David Aponte , Colby Banbury , Daniel P. Robinson , Tianyu Ding , Kazuhito Koishida , Ilya Zharkov , Tianyi Chen

Dataset pruning aims to select a subset of a dataset for efficient model training. While data efficiency in natural language processing has primarily focused on within-corpus scenarios during model pre-training, efficient dataset pruning…

Computation and Language · Computer Science 2025-01-07 Binh-Nguyen Nguyen , Yang He

Inspired by the principle of deliberate practice in human learning, we propose Deliberate Practice for Synthetic Data Generation (DP), a novel framework that improves sample efficiency through dynamic synthetic data generation. Prior work…

Dataset pruning aims to construct a coreset capable of achieving performance comparable to the original, full dataset. Most existing dataset pruning methods rely on snapshot-based criteria to identify representative samples, often resulting…

Computer Vision and Pattern Recognition · Computer Science 2024-05-29 Xin Zhang , Jiawei Du , Yunsong Li , Weiying Xie , Joey Tianyi Zhou

The great success of deep learning heavily relies on increasingly larger training data, which comes at a price of huge computational and infrastructural costs. This poses crucial questions that, do all training data contribute to model's…

Machine Learning · Computer Science 2023-02-28 Shuo Yang , Zeke Xie , Hanyu Peng , Min Xu , Mingming Sun , Ping Li

The increasing demand for on-device training of deep neural networks (DNNs) aims to leverage personal data for high-performance applications while addressing privacy concerns and reducing communication latency. However, resource-constrained…

Hardware Architecture · Computer Science 2026-03-31 Jinming Lu , Jiayi Tian , Hai Li , Ian Young , Zheng Zhang

Massive data is often considered essential for deep learning applications, but it also incurs significant computational and infrastructural costs. Therefore, dataset pruning (DP) has emerged as an effective way to improve data efficiency by…

Machine Learning · Computer Science 2023-11-21 Yihua Zhang , Yimeng Zhang , Aochuan Chen , Jinghan Jia , Jiancheng Liu , Gaowen Liu , Mingyi Hong , Shiyu Chang , Sijia Liu

Deep learning models require an enormous amount of data for training. However, recently there is a shift in machine learning from model-centric to data-centric approaches. In data-centric approaches, the focus is to refine and improve the…

Computer Vision and Pattern Recognition · Computer Science 2023-08-08 Muhammad Asif Khan , Ridha Hamila , Hamid Menouar

As the state-of-the-art machine learning methods in many fields rely on larger datasets, storing datasets and training models on them become significantly more expensive. This paper proposes a training set synthesis technique for…

Computer Vision and Pattern Recognition · Computer Science 2021-03-09 Bo Zhao , Konda Reddy Mopuri , Hakan Bilen

Deep Learning has revolutionized the fields of computer vision, natural language understanding, speech recognition, information retrieval and more. Many techniques have evolved over the past decade that made models lighter, faster, and…

Machine Learning · Computer Science 2022-05-25 Sabeesh Ethiraj , Bharath Kumar Bolla

Supervised fine-tuning (SFT) is crucial for aligning Large Language Models (LLMs) with human instructions. The primary goal during SFT is to select a small yet representative subset of training data from the larger pool, such that…

Computation and Language · Computer Science 2024-12-10 Tingyu Xia , Bowen Yu , Kai Dang , An Yang , Yuan Wu , Yuan Tian , Yi Chang , Junyang Lin
‹ Prev 1 2 3 10 Next ›