Related papers: Towards Consistent and Efficient Dataset Distillat…

Dataset Distillation via Adversarial Prediction Matching

Dataset distillation is the technique of synthesizing smaller condensed datasets from large original datasets while retaining necessary information to persist the effect. In this paper, we approach the dataset distillation problem from a…

Computer Vision and Pattern Recognition · Computer Science 2023-12-15 Mingyang Chen , Bo Huang , Junda Lu , Bing Li , Yi Wang , Minhao Cheng , Wei Wang

One Category One Prompt: Dataset Distillation using Diffusion Models

The extensive amounts of data required for training deep neural networks pose significant challenges on storage and transmission fronts. Dataset distillation has emerged as a promising technique to condense the information of massive…

Computer Vision and Pattern Recognition · Computer Science 2024-03-13 Ali Abbasi , Ashkan Shahbazi , Hamed Pirsiavash , Soheil Kolouri

Foreground-Aware Dataset Distillation via Dynamic Patch Selection

In this paper, we propose a foreground-aware dataset distillation method that enhances patch selection in a content-adaptive manner. With the rising computational cost of training large-scale deep models, dataset distillation has emerged as…

Computer Vision and Pattern Recognition · Computer Science 2026-01-07 Longzhen Li , Guang Li , Ren Togo , Keisuke Maeda , Takahiro Ogawa , Miki Haseyama

Dataset Distillation with Probabilistic Latent Features

As deep learning models grow in complexity and the volume of training data increases, reducing storage and computational costs becomes increasingly important. Dataset distillation addresses this challenge by synthesizing a compact set of…

Computer Vision and Pattern Recognition · Computer Science 2025-05-20 Zhe Li , Sarah Cechnicka , Cheng Ouyang , Katharina Breininger , Peter Schüffler , Bernhard Kainz

Diversity-Driven Generative Dataset Distillation Based on Diffusion Model with Self-Adaptive Memory

Dataset distillation enables the training of deep neural networks with comparable performance in significantly reduced time by compressing large datasets into small and representative ones. Although the introduction of generative models has…

Machine Learning · Computer Science 2025-05-27 Mingzhuo Li , Guang Li , Jiafeng Mao , Takahiro Ogawa , Miki Haseyama

Dataset Distillation as Pushforward Optimal Quantization

Dataset distillation aims to find a synthetic training set such that training on the synthetic data achieves similar performance to training on real data, with orders of magnitude less computational requirements. Existing methods can be…

Machine Learning · Computer Science 2026-02-09 Hong Ye Tan , Emma Slade

Efficient Dataset Distillation for Pre-Trained Self-Supervised Models via Statistical Flow Matching

Dataset distillation seeks to synthesize a highly compact dataset that achieves performance comparable to the original dataset on downstream tasks. For the classification task that use pre-trained self-supervised models as backbones,…

Computer Vision and Pattern Recognition · Computer Science 2026-05-12 Qianxin Xia , Jiawei Du , Xin Zhang , Yuhan Zhang , Jielei Wang , Guoming Lu

Latent Video Dataset Distillation

Dataset distillation has demonstrated remarkable effectiveness in high-compression scenarios for image datasets. While video datasets inherently contain greater redundancy, existing video dataset distillation methods primarily focus on…

Computer Vision and Pattern Recognition · Computer Science 2025-04-29 Ning Li , Antai Andy Liu , Jingran Zhang , Justin Cui

Towards Realistic Remote Sensing Dataset Distillation with Discriminative Prototype-guided Diffusion

Recent years have witnessed the remarkable success of deep learning in remote sensing image interpretation, driven by the availability of large-scale benchmark datasets. However, this reliance on massive training data also brings two major…

Computer Vision and Pattern Recognition · Computer Science 2026-01-23 Yonghao Xu , Pedram Ghamisi , Qihao Weng

Video Dataset Condensation with Diffusion Models

In recent years, the rapid expansion of dataset sizes and the increasing complexity of deep learning models have significantly escalated the demand for computational resources, both for data storage and model training. Dataset distillation…

Computer Vision and Pattern Recognition · Computer Science 2025-12-10 Zhe Li , Hadrien Reynaud , Mischa Dombrowski , Sarah Cechnicka , Franciskus Xaverius Erick , Bernhard Kainz

Efficient Dataset Distillation via Minimax Diffusion

Dataset distillation reduces the storage and computational consumption of training a network by generating a small surrogate dataset that encapsulates rich information of the original large-scale one. However, previous distillation methods…

Computer Vision and Pattern Recognition · Computer Science 2024-03-26 Jianyang Gu , Saeed Vahidian , Vyacheslav Kungurtsev , Haonan Wang , Wei Jiang , Yang You , Yiran Chen

Beyond Dataset Distillation: Lossless Dataset Concentration via Diffusion-Assisted Distribution Alignment

The high cost and accessibility problem associated with large datasets hinder the development of large-scale visual recognition systems. Dataset Distillation addresses these problems by synthesizing compact surrogate datasets for efficient…

Computer Vision and Pattern Recognition · Computer Science 2026-03-31 Tongfei Liu , Yufan Liu , Bing Li , Weiming Hu

FocusDD: Real-World Scene Infusion for Robust Dataset Distillation

Dataset distillation has emerged as a strategy to compress real-world datasets for efficient training. However, it struggles with large-scale and high-resolution datasets, limiting its practicality. This paper introduces a novel…

Computer Vision and Pattern Recognition · Computer Science 2025-01-14 Youbing Hu , Yun Cheng , Olga Saukh , Firat Ozdemir , Anqi Lu , Zhiqiang Cao , Zhijun Li

Diffusion-Augmented Coreset Expansion for Scalable Dataset Distillation

With the rapid scaling of neural networks, data storage and communication demands have intensified. Dataset distillation has emerged as a promising solution, condensing information from extensive datasets into a compact set of synthetic…

Computer Vision and Pattern Recognition · Computer Science 2024-12-09 Ali Abbasi , Shima Imani , Chenyang An , Gayathri Mahalingam , Harsh Shrivastava , Maurice Diesendruck , Hamed Pirsiavash , Pramod Sharma , Soheil Kolouri

Generalizing Dataset Distillation via Deep Generative Prior

Dataset Distillation aims to distill an entire dataset's knowledge into a few synthetic images. The idea is to synthesize a small number of synthetic data points that, when given to a learning algorithm as training data, result in a model…

Computer Vision and Pattern Recognition · Computer Science 2023-05-05 George Cazenavette , Tongzhou Wang , Antonio Torralba , Alexei A. Efros , Jun-Yan Zhu

Curriculum Dataset Distillation

Most dataset distillation methods struggle to accommodate large-scale datasets due to their substantial computational and memory requirements. Recent research has begun to explore scalable disentanglement methods. However, there are still…

Computer Vision and Pattern Recognition · Computer Science 2025-07-14 Zhiheng Ma , Anjia Cao , Funing Yang , Yihong Gong , Xing Wei

A Comprehensive Survey of Dataset Distillation

Deep learning technology has developed unprecedentedly in the last decade and has become the primary choice in many application domains. This progress is mainly attributed to a systematic collaboration in which rapidly growing computing…

Machine Learning · Computer Science 2023-12-27 Shiye Lei , Dacheng Tao

Distributional Dataset Distillation with Subtask Decomposition

What does a neural network learn when training from a task-specific dataset? Synthesizing this knowledge is the central idea behind Dataset Distillation, which recent work has shown can be used to compress large datasets into a small set of…

Machine Learning · Computer Science 2024-03-05 Tian Qin , Zhiwei Deng , David Alvarez-Melis

MGD$^3$: Mode-Guided Dataset Distillation using Diffusion Models

Dataset distillation has emerged as an effective strategy, significantly reducing training costs and facilitating more efficient model deployment. Recent advances have leveraged generative models to distill datasets by capturing the…

Computer Vision and Pattern Recognition · Computer Science 2025-05-27 Jeffrey A. Chan-Santiago , Praveen Tirupattur , Gaurav Kumar Nayak , Gaowen Liu , Mubarak Shah

DataDAM: Efficient Dataset Distillation with Attention Matching

Researchers have long tried to minimize training costs in deep learning while maintaining strong generalization across diverse datasets. Emerging research on dataset distillation aims to reduce training costs by creating a small synthetic…

Computer Vision and Pattern Recognition · Computer Science 2025-03-25 Ahmad Sajedi , Samir Khaki , Ehsan Amjadian , Lucy Z. Liu , Yuri A. Lawryshyn , Konstantinos N. Plataniotis