Related papers: DRoP: Distributionally Robust Data Pruning

Accelerating Deep Learning with Dynamic Data Pruning

Deep learning's success has been attributed to the training of large, overparameterized models on massive amounts of data. As this trend continues, model training has become prohibitively costly, requiring access to powerful computing…

Machine Learning · Computer Science 2021-11-25 Ravi S Raju , Kyle Daruwalla , Mikko Lipasti

Data Pruning in Generative Diffusion Models

Data pruning is the problem of identifying a core subset that is most beneficial to training and discarding the remainder. While pruning strategies are well studied for discriminative models like those used in classification, little…

Machine Learning · Computer Science 2025-03-17 Rania Briq , Jiangtao Wang , Stefan Kesselheim

Selectivity Drives Productivity: Efficient Dataset Pruning for Enhanced Transfer Learning

Massive data is often considered essential for deep learning applications, but it also incurs significant computational and infrastructural costs. Therefore, dataset pruning (DP) has emerged as an effective way to improve data efficiency by…

Machine Learning · Computer Science 2023-11-21 Yihua Zhang , Yimeng Zhang , Aochuan Chen , Jinghan Jia , Jiancheng Liu , Gaowen Liu , Mingyi Hong , Shiyu Chang , Sijia Liu

Learning Distributionally Robust Models at Scale via Composite Optimization

To train machine learning models that are robust to distribution shifts in the data, distributionally robust optimization (DRO) has been proven very effective. However, the existing approaches to learning a distributionally robust model…

Machine Learning · Computer Science 2022-03-21 Farzin Haddadpour , Mohammad Mahdi Kamani , Mehrdad Mahdavi , Amin Karbasi

DRPruning: Efficient Large Language Model Pruning through Distributionally Robust Optimization

Large language models (LLMs) deliver impressive results but face challenges from increasing model sizes and computational costs. Structured pruning reduces model size and speeds up inference but often causes uneven degradation across…

Computation and Language · Computer Science 2025-05-28 Hexuan Deng , Wenxiang Jiao , Xuebo Liu , Jing Li , Min Zhang , Zhaopeng Tu

Efficient Stochastic Gradient Descent for Learning with Distributionally Robust Optimization

Distributionally robust optimization (DRO) problems are increasingly seen as a viable method to train machine learning models for improved model generalization. These min-max formulations, however, are more difficult to solve. We therefore…

Machine Learning · Statistics 2020-11-03 Soumyadip Ghosh , Mark Squillante , Ebisa Wollega

CLIP: Train Faster with Less Data

Deep learning models require an enormous amount of data for training. However, recently there is a shift in machine learning from model-centric to data-centric approaches. In data-centric approaches, the focus is to refine and improve the…

Computer Vision and Pattern Recognition · Computer Science 2023-08-08 Muhammad Asif Khan , Ridha Hamila , Hamid Menouar

Multimodal-Guided Dynamic Dataset Pruning for Robust and Efficient Data-Centric Learning

Modern deep models are trained on large real-world datasets, where data quality varies and redundancy is common. Data-centric approaches such as dataset pruning have shown promise in improving training efficiency and model performance.…

Machine Learning · Computer Science 2025-07-18 Suorong Yang , Peijia Li , Yujie Liu , Zhiming Xu , Peng Ye , Wanli Ouyang , Furao Shen , Dongzhan Zhou

Dataset Pruning: Reducing Training Data by Examining Generalization Influence

The great success of deep learning heavily relies on increasingly larger training data, which comes at a price of huge computational and infrastructural costs. This poses crucial questions that, do all training data contribute to model's…

Machine Learning · Computer Science 2023-02-28 Shuo Yang , Zeke Xie , Hanyu Peng , Min Xu , Mingming Sun , Ping Li

Stochastic Model Pruning via Weight Dropping Away and Back

Deep neural networks have dramatically achieved great success on a variety of challenging tasks. However, most successful DNNs have an extremely complex structure, leading to extensive research on model compression.As a significant area of…

Machine Learning · Computer Science 2020-04-13 Haipeng Jia , Xueshuang Xiang , Da Fan , Meiyu Huang , Changhao Sun , Yang He

Winning the Lottery Ahead of Time: Efficient Early Network Pruning

Pruning, the task of sparsifying deep neural networks, received increasing attention recently. Although state-of-the-art pruning methods extract highly sparse models, they neglect two main challenges: (1) the process of finding these sparse…

Machine Learning · Computer Science 2022-06-22 John Rachwan , Daniel Zügner , Bertrand Charpentier , Simon Geisler , Morgane Ayle , Stephan Günnemann

DROP: Distributionally Robust Optimization for Multi-task Learning in Graphical Models

Gaussian Graphical Models (GGMs) are widely used to infer conditional dependence structures in high-dimensional data. However, standard precision matrix estimators are highly sensitive to data contamination, such as extreme outliers and…

Applications · Statistics 2026-03-25 Canruo Shen , Xintong Ji , Qiong Li , Wenzhi Yang , Xiaoping Shi

Disentangling the Roles of Representation and Selection in Data Pruning

Data pruning, selecting small but impactful subsets, offers a promising way to efficiently scale NLP model training. However, existing methods often involve many different design choices, which have not been systematically studied. This…

Computation and Language · Computer Science 2025-07-08 Yupei Du , Yingjin Song , Hugh Mee Wong , Daniil Ignatev , Albert Gatt , Dong Nguyen

Class-Aware Pruning for Efficient Neural Networks

Deep neural networks (DNNs) have demonstrated remarkable success in various fields. However, the large number of floating-point operations (FLOPs) in DNNs poses challenges for their deployment in resource-constrained applications, e.g.,…

Artificial Intelligence · Computer Science 2024-02-20 Mengnan Jiang , Jingcun Wang , Amro Eldebiky , Xunzhao Yin , Cheng Zhuo , Ing-Chao Lin , Grace Li Zhang

Algorithmic Bias and Data Bias: Understanding the Relation between Distributionally Robust Optimization and Data Curation

Machine learning systems based on minimizing average error have been shown to perform inconsistently across notable subsets of the data, which is not exposed by a low average error for the entire dataset. In consequential social and…

Machine Learning · Computer Science 2021-06-18 Agnieszka Słowik , Léon Bottou

Calibrating Deep Neural Networks using Explicit Regularisation and Dynamic Data Pruning

Deep neural networks (DNN) are prone to miscalibrated predictions, often exhibiting a mismatch between the predicted output and the associated confidence scores. Contemporary model calibration techniques mitigate the problem of…

Machine Learning · Computer Science 2022-12-21 Ramya Hebbalaguppe , Rishabh Patra , Tirtharaj Dash , Gautam Shroff , Lovekesh Vig

PUMA: margin-based data pruning

Deep learning has been able to outperform humans in terms of classification accuracy in many tasks. However, to achieve robustness to adversarial perturbations, the best methodologies require to perform adversarial training on a much larger…

Machine Learning · Computer Science 2024-05-13 Javier Maroto , Pascal Frossard

Manifold Regularized Dynamic Network Pruning

Neural network pruning is an essential approach for reducing the computational complexity of deep models so that they can be well deployed on resource-limited devices. Compared with conventional methods, the recently developed dynamic…

Computer Vision and Pattern Recognition · Computer Science 2021-03-11 Yehui Tang , Yunhe Wang , Yixing Xu , Yiping Deng , Chao Xu , Dacheng Tao , Chang Xu

Aligning Distributionally Robust Optimization with Practical Deep Learning Needs

While traditional Deep Learning (DL) optimization methods treat all training samples equally, Distributionally Robust Optimization (DRO) adaptively assigns importance weights to different samples. However, a significant gap exists between…

Machine Learning · Computer Science 2025-09-26 Dmitrii Feoktistov , Igor Ignashin , Andrey Veprikov , Nikita Borovko , Alexander Bogdanov , Savelii Chezhegov , Aleksandr Beznosikov

Lightweight Dataset Pruning without Full Training via Example Difficulty and Prediction Uncertainty

Recent advances in deep learning rely heavily on massive datasets, leading to substantial storage and training costs. Dataset pruning aims to alleviate this demand by discarding redundant examples. However, many existing methods require…

Machine Learning · Computer Science 2025-06-13 Yeseul Cho , Baekrok Shin , Changmin Kang , Chulhee Yun