Related papers: Efficient Dataset Distillation Using Random Featur…

Dataset Meta-Learning from Kernel Ridge-Regression

One of the most fundamental aspects of any machine learning algorithm is the training data used by the algorithm. We introduce the novel concept of $\epsilon$-approximation of datasets, obtaining datasets which are much smaller than or are…

Machine Learning · Computer Science 2021-03-24 Timothy Nguyen , Zhourong Chen , Jaehoon Lee

On the Size and Approximation Error of Distilled Sets

Dataset Distillation is the task of synthesizing small datasets from large ones while still retaining comparable predictive accuracy to the original uncompressed dataset. Despite significant empirical progress in recent years, there is…

Machine Learning · Computer Science 2023-05-24 Alaa Maalouf , Murad Tukan , Noel Loo , Ramin Hasani , Mathias Lechner , Daniela Rus

Dataset Distillation using Neural Feature Regression

Dataset distillation aims to learn a small synthetic dataset that preserves most of the information from the original dataset. Dataset distillation can be formulated as a bi-level meta-learning problem where the outer loop optimizes the…

Machine Learning · Computer Science 2022-10-25 Yongchao Zhou , Ehsan Nezhadarya , Jimmy Ba

Dataset Distillation with Neural Characteristic Function: A Minmax Perspective

Dataset distillation has emerged as a powerful approach for reducing data requirements in deep learning. Among various methods, distribution matching-based approaches stand out for their balance of computational efficiency and strong…

Computer Vision and Pattern Recognition · Computer Science 2025-03-03 Shaobo Wang , Yicun Yang , Zhiyuan Liu , Chenghao Sun , Xuming Hu , Conghui He , Linfeng Zhang

On the Diversity and Realism of Distilled Dataset: An Efficient Dataset Distillation Paradigm

Contemporary machine learning requires training large neural networks on massive datasets and thus faces the challenges of high computational demands. Dataset distillation, as a recent emerging strategy, aims to compress real-world datasets…

Computer Vision and Pattern Recognition · Computer Science 2024-03-20 Peng Sun , Bei Shi , Daiwei Yu , Tao Lin

Dataset Distillation with Infinitely Wide Convolutional Networks

The effectiveness of machine learning algorithms arises from being able to extract useful features from large amounts of data. As model and dataset sizes increase, dataset distillation methods that compress large datasets into significantly…

Machine Learning · Computer Science 2022-01-19 Timothy Nguyen , Roman Novak , Lechao Xiao , Jaehoon Lee

Towards Consistent and Efficient Dataset Distillation via Diffusion-Driven Selection

Dataset distillation provides an effective approach to reduce memory and computational costs by optimizing a compact dataset that achieves performance comparable to the full original. However, for large-scale datasets and complex deep…

Computer Vision and Pattern Recognition · Computer Science 2025-11-14 Xinhao Zhong , Shuoyang Sun , Xulin Gu , Zhaoyang Xu , Yaowei Wang , Min Zhang , Bin Chen

Dataset Distillation via Curriculum Data Synthesis in Large Data Era

Dataset distillation or condensation aims to generate a smaller but representative subset from a large dataset, which allows a model to be trained more efficiently, meanwhile evaluating on the original testing data distribution to achieve…

Computer Vision and Pattern Recognition · Computer Science 2024-11-26 Zeyuan Yin , Zhiqiang Shen

Dataset Distillation via Adversarial Prediction Matching

Dataset distillation is the technique of synthesizing smaller condensed datasets from large original datasets while retaining necessary information to persist the effect. In this paper, we approach the dataset distillation problem from a…

Computer Vision and Pattern Recognition · Computer Science 2023-12-15 Mingyang Chen , Bo Huang , Junda Lu , Bing Li , Yi Wang , Minhao Cheng , Wei Wang

Dataset Distillation with Convexified Implicit Gradients

We propose a new dataset distillation algorithm using reparameterization and convexification of implicit gradients (RCIG), that substantially improves the state-of-the-art. To this end, we first formulate dataset distillation as a bi-level…

Machine Learning · Computer Science 2023-11-13 Noel Loo , Ramin Hasani , Mathias Lechner , Daniela Rus

Dataset Distillation as Pushforward Optimal Quantization

Dataset distillation aims to find a synthetic training set such that training on the synthetic data achieves similar performance to training on real data, with orders of magnitude less computational requirements. Existing methods can be…

Machine Learning · Computer Science 2026-02-09 Hong Ye Tan , Emma Slade

Generalized Kernel Inducing Points by Duality Gap for Dataset Distillation

We propose Duality Gap KIP (DGKIP), an extension of the Kernel Inducing Points (KIP) method for dataset distillation. While existing dataset distillation methods often rely on bi-level optimization, DGKIP eliminates the need for such…

Machine Learning · Statistics 2025-02-19 Tatsuya Aoyama , Hanting Yang , Hiroyuki Hanada , Satoshi Akahane , Tomonari Tanaka , Yoshito Okura , Yu Inatsu , Noriaki Hashimoto , Taro Murayama , Hanju Lee , Shinya Kojima , Ichiro Takeuchi

Emphasizing Discriminative Features for Dataset Distillation in Complex Scenarios

Dataset distillation has demonstrated strong performance on simple datasets like CIFAR, MNIST, and TinyImageNet but struggles to achieve similar results in more complex scenarios. In this paper, we propose EDF (emphasizes the discriminative…

Computer Vision and Pattern Recognition · Computer Science 2025-04-01 Kai Wang , Zekai Li , Zhi-Qi Cheng , Samir Khaki , Ahmad Sajedi , Ramakrishna Vedantam , Konstantinos N Plataniotis , Alexander Hauptmann , Yang You

Differentially Private Kernel Inducing Points using features from ScatterNets (DP-KIP-ScatterNet) for Privacy Preserving Data Distillation

Data distillation aims to generate a small data set that closely mimics the performance of a given learning algorithm on the original data set. The distilled dataset is hence useful to simplify the training process thanks to its small data…

Machine Learning · Computer Science 2024-04-23 Margarita Vinaroz , Mi Jung Park

Efficient Dataset Distillation for Pre-Trained Self-Supervised Models via Statistical Flow Matching

Dataset distillation seeks to synthesize a highly compact dataset that achieves performance comparable to the original dataset on downstream tasks. For the classification task that use pre-trained self-supervised models as backbones,…

Computer Vision and Pattern Recognition · Computer Science 2026-05-12 Qianxin Xia , Jiawei Du , Xin Zhang , Yuhan Zhang , Jielei Wang , Guoming Lu

Enhancing Dataset Distillation via Non-Critical Region Refinement

Dataset distillation has become a popular method for compressing large datasets into smaller, more efficient representations while preserving critical information for model training. Data features are broadly categorized into two types:…

Computer Vision and Pattern Recognition · Computer Science 2025-03-25 Minh-Tuan Tran , Trung Le , Xuan-May Le , Thanh-Toan Do , Dinh Phung

DREAM: Efficient Dataset Distillation by Representative Matching

Dataset distillation aims to synthesize small datasets with little information loss from original large-scale ones for reducing storage and training costs. Recent state-of-the-art methods mainly constrain the sample synthesis process by…

Computer Vision and Pattern Recognition · Computer Science 2023-08-31 Yanqing Liu , Jianyang Gu , Kai Wang , Zheng Zhu , Wei Jiang , Yang You

From Fewer Samples to Fewer Bits: Reframing Dataset Distillation as Joint Optimization of Precision and Compactness

Dataset Distillation (DD) compresses large datasets into compact synthetic ones that maintain training performance. However, current methods mainly target sample reduction, with limited consideration of data precision and its impact on…

Computer Vision and Pattern Recognition · Computer Science 2026-03-04 My H. Dinh , Aditya Sant , Akshay Malhotra , Keya Patani , Shahab Hamidi-Rad

Diversity-Driven Generative Dataset Distillation Based on Diffusion Model with Self-Adaptive Memory

Dataset distillation enables the training of deep neural networks with comparable performance in significantly reduced time by compressing large datasets into small and representative ones. Although the introduction of generative models has…

Machine Learning · Computer Science 2025-05-27 Mingzhuo Li , Guang Li , Jiafeng Mao , Takahiro Ogawa , Miki Haseyama

Kernel Distillation for Fast Gaussian Processes Prediction

Gaussian processes (GPs) are flexible models that can capture complex structure in large-scale dataset due to their non-parametric nature. However, the usage of GPs in real-world application is limited due to their high computational cost…

Machine Learning · Statistics 2018-11-06 Congzheng Song , Yiming Sun