Related papers: Dataset Distillation as Pushforward Optimal Quanti…

Data Distillation Can Be Like Vodka: Distilling More Times For Better Quality

Dataset distillation aims to minimize the time and memory needed for training deep networks on large datasets, by creating a small set of synthetic images that has a similar generalization performance to that of the full dataset. However,…

Computer Vision and Pattern Recognition · Computer Science 2023-10-12 Xuxi Chen , Yu Yang , Zhangyang Wang , Baharan Mirzasoleiman

Towards Consistent and Efficient Dataset Distillation via Diffusion-Driven Selection

Dataset distillation provides an effective approach to reduce memory and computational costs by optimizing a compact dataset that achieves performance comparable to the full original. However, for large-scale datasets and complex deep…

Computer Vision and Pattern Recognition · Computer Science 2025-11-14 Xinhao Zhong , Shuoyang Sun , Xulin Gu , Zhaoyang Xu , Yaowei Wang , Min Zhang , Bin Chen

Dataset Distillation by Matching Training Trajectories

Dataset distillation is the task of synthesizing a small dataset such that a model trained on the synthetic set will match the test accuracy of the model trained on the full dataset. In this paper, we propose a new formulation that…

Computer Vision and Pattern Recognition · Computer Science 2022-03-23 George Cazenavette , Tongzhou Wang , Antonio Torralba , Alexei A. Efros , Jun-Yan Zhu

Beyond Dataset Distillation: Lossless Dataset Concentration via Diffusion-Assisted Distribution Alignment

The high cost and accessibility problem associated with large datasets hinder the development of large-scale visual recognition systems. Dataset Distillation addresses these problems by synthesizing compact surrogate datasets for efficient…

Computer Vision and Pattern Recognition · Computer Science 2026-03-31 Tongfei Liu , Yufan Liu , Bing Li , Weiming Hu

A Comprehensive Survey of Dataset Distillation

Deep learning technology has developed unprecedentedly in the last decade and has become the primary choice in many application domains. This progress is mainly attributed to a systematic collaboration in which rapidly growing computing…

Machine Learning · Computer Science 2023-12-27 Shiye Lei , Dacheng Tao

Dataset Distillation via Adversarial Prediction Matching

Dataset distillation is the technique of synthesizing smaller condensed datasets from large original datasets while retaining necessary information to persist the effect. In this paper, we approach the dataset distillation problem from a…

Computer Vision and Pattern Recognition · Computer Science 2023-12-15 Mingyang Chen , Bo Huang , Junda Lu , Bing Li , Yi Wang , Minhao Cheng , Wei Wang

Dataset Distillation Meets Provable Subset Selection

Deep learning has grown tremendously over recent years, yielding state-of-the-art results in various fields. However, training such models requires huge amounts of data, increasing the computational time and cost. To address this, dataset…

Machine Learning · Computer Science 2023-07-18 Murad Tukan , Alaa Maalouf , Margarita Osadchy

Dataset Distillation via Curriculum Data Synthesis in Large Data Era

Dataset distillation or condensation aims to generate a smaller but representative subset from a large dataset, which allows a model to be trained more efficiently, meanwhile evaluating on the original testing data distribution to achieve…

Computer Vision and Pattern Recognition · Computer Science 2024-11-26 Zeyuan Yin , Zhiqiang Shen

Boost Self-Supervised Dataset Distillation via Parameterization, Predefined Augmentation, and Approximation

Although larger datasets are crucial for training large deep models, the rapid growth of dataset size has brought a significant challenge in terms of considerable training costs, which even results in prohibitive computational expenses.…

Computer Vision and Pattern Recognition · Computer Science 2025-08-06 Sheng-Feng Yu , Jia-Jiun Yao , Wei-Chen Chiu

Dataset Distillation Efficiently Encodes Low-Dimensional Representations from Gradient-Based Learning of Non-Linear Tasks

Dataset distillation, a training-aware data compression technique, has recently attracted increasing attention as an effective tool for mitigating costs of optimization and data storage. However, progress remains largely empirical.…

Machine Learning · Computer Science 2026-03-31 Yuri Kinoshita , Naoki Nishikawa , Taro Toyoizumi

Dataset Distillation

Model distillation aims to distill the knowledge of a complex model into a simpler one. In this paper, we consider an alternative formulation called dataset distillation: we keep the model fixed and instead attempt to distill the knowledge…

Machine Learning · Computer Science 2020-02-26 Tongzhou Wang , Jun-Yan Zhu , Antonio Torralba , Alexei A. Efros

Dataset Distillation with Probabilistic Latent Features

As deep learning models grow in complexity and the volume of training data increases, reducing storage and computational costs becomes increasingly important. Dataset distillation addresses this challenge by synthesizing a compact set of…

Computer Vision and Pattern Recognition · Computer Science 2025-05-20 Zhe Li , Sarah Cechnicka , Cheng Ouyang , Katharina Breininger , Peter Schüffler , Bernhard Kainz

Image Distillation for Safe Data Sharing in Histopathology

Histopathology can help clinicians make accurate diagnoses, determine disease prognosis, and plan appropriate treatment strategies. As deep learning techniques prove successful in the medical domain, the primary challenges become limited…

Computer Vision and Pattern Recognition · Computer Science 2024-07-11 Zhe Li , Bernhard Kainz

Self-Supervised Dataset Distillation for Transfer Learning

Dataset distillation methods have achieved remarkable success in distilling a large dataset into a small set of representative samples. However, they are not designed to produce a distilled dataset that can be effectively used for…

Machine Learning · Computer Science 2024-04-15 Dong Bok Lee , Seanie Lee , Joonho Ko , Kenji Kawaguchi , Juho Lee , Sung Ju Hwang

Distributional Dataset Distillation with Subtask Decomposition

What does a neural network learn when training from a task-specific dataset? Synthesizing this knowledge is the central idea behind Dataset Distillation, which recent work has shown can be used to compress large datasets into a small set of…

Machine Learning · Computer Science 2024-03-05 Tian Qin , Zhiwei Deng , David Alvarez-Melis

D$^4$M: Dataset Distillation via Disentangled Diffusion Model

Dataset distillation offers a lightweight synthetic dataset for fast network training with promising test accuracy. To imitate the performance of the original dataset, most approaches employ bi-level optimization and the distillation space…

Computer Vision and Pattern Recognition · Computer Science 2024-07-23 Duo Su , Junjie Hou , Weizhi Gao , Yingjie Tian , Bowen Tang

Dataset Distillation in Latent Space

Dataset distillation (DD) is a newly emerging research area aiming at alleviating the heavy computational load in training models on large datasets. It tries to distill a large dataset into a small and condensed one so that models trained…

Computer Vision and Pattern Recognition · Computer Science 2023-11-28 Yuxuan Duan , Jianfu Zhang , Liqing Zhang

Scaling Up Dataset Distillation to ImageNet-1K with Constant Memory

Dataset Distillation is a newly emerging area that aims to distill large datasets into much smaller and highly informative synthetic ones to accelerate training and reduce storage. Among various dataset distillation methods,…

Computer Vision and Pattern Recognition · Computer Science 2023-11-02 Justin Cui , Ruochen Wang , Si Si , Cho-Jui Hsieh

Generalizing Dataset Distillation via Deep Generative Prior

Dataset Distillation aims to distill an entire dataset's knowledge into a few synthetic images. The idea is to synthesize a small number of synthetic data points that, when given to a learning algorithm as training data, result in a model…

Computer Vision and Pattern Recognition · Computer Science 2023-05-05 George Cazenavette , Tongzhou Wang , Antonio Torralba , Alexei A. Efros , Jun-Yan Zhu

Dataset Distillation from First Principles: Integrating Core Information Extraction and Purposeful Learning

Dataset distillation (DD) is an increasingly important technique that focuses on constructing a synthetic dataset capable of capturing the core information in training data to achieve comparable performance in models trained on the latter.…

Machine Learning · Computer Science 2024-09-04 Vyacheslav Kungurtsev , Yuanfang Peng , Jianyang Gu , Saeed Vahidian , Anthony Quinn , Fadwa Idlahcen , Yiran Chen