Related papers: Semantic distillation: a method for clustering obj…

SAS: Semantic-aware Sampling for Generative Dataset Distillation

Deep neural networks have achieved impressive performance across a wide range of tasks, but this success often comes with substantial computational and storage costs due to large-scale training data. Dataset distillation addresses this…

Computer Vision and Pattern Recognition · Computer Science 2026-05-19 Mingzhuo Li , Guang Li , Linfeng Ye , Jiafeng Mao , Takahiro Ogawa , Konstantinos N. Plataniotis , Miki Haseyama

Exploring the potential of prototype-based soft-labels data distillation for imbalanced data classification

Dataset distillation aims at synthesizing a dataset by a small number of artificially generated data items, which, when used as training data, reproduce or approximate a machine learning (ML) model as if it were trained on the entire…

Machine Learning · Computer Science 2024-03-27 Radu-Andrei Rosu , Mihaela-Elena Breaban , Henri Luchian

Dataset Distillation by Matching Training Trajectories

Dataset distillation is the task of synthesizing a small dataset such that a model trained on the synthetic set will match the test accuracy of the model trained on the full dataset. In this paper, we propose a new formulation that…

Computer Vision and Pattern Recognition · Computer Science 2022-03-23 George Cazenavette , Tongzhou Wang , Antonio Torralba , Alexei A. Efros , Jun-Yan Zhu

Data Distillation: A Survey

The popularity of deep learning has led to the curation of a vast number of massive and multifarious datasets. Despite having close-to-human performance on individual tasks, training parameter-hungry models on large datasets poses…

Machine Learning · Computer Science 2023-09-27 Noveen Sachdeva , Julian McAuley

Exploring Multilingual Text Data Distillation

With the rise of deep learning, large datasets and complex models have become common, requiring significant computing power. To address this, data distillation has emerged as a technique to quickly train models with lower memory and time…

Computation and Language · Computer Science 2023-08-10 Shivam Sahni , Harsh Patel

Dataset Distillation as Pushforward Optimal Quantization

Dataset distillation aims to find a synthetic training set such that training on the synthetic data achieves similar performance to training on real data, with orders of magnitude less computational requirements. Existing methods can be…

Machine Learning · Computer Science 2026-02-09 Hong Ye Tan , Emma Slade

Quartile Clustering: A quartile based technique for Generating Meaningful Clusters

Clustering is one of the main tasks in exploratory data analysis and descriptive statistics where the main objective is partitioning observations in groups. Clustering has a broad range of application in varied domains like climate,…

Databases · Computer Science 2012-03-20 Saptarsi Goswami , Amlan Chakrabarti

DIVER:Diving Deeper into Distilled Data via Expressive Semantic Recovery

Dataset distillation aims to synthesize a compact proxy dataset that is unreadable or non-raw from the original dataset for privacy protection and highly efficient learning. However, previous approaches typically adopt a single-stage…

Computer Vision and Pattern Recognition · Computer Science 2026-05-26 Qianxin Xia , Zhiyong Shu , Wenbo Jiang , Jiawei Du , Jielei Wang , Guoming Lu

Distillation of Diffusion Features for Semantic Correspondence

Semantic correspondence, the task of determining relationships between different parts of images, underpins various applications including 3D reconstruction, image-to-image translation, object tracking, and visual place recognition. Recent…

Computer Vision and Pattern Recognition · Computer Science 2024-12-05 Frank Fundel , Johannes Schusterbauer , Vincent Tao Hu , Björn Ommer

A Comprehensive Survey of Dataset Distillation

Deep learning technology has developed unprecedentedly in the last decade and has become the primary choice in many application domains. This progress is mainly attributed to a systematic collaboration in which rapidly growing computing…

Machine Learning · Computer Science 2023-12-27 Shiye Lei , Dacheng Tao

Using Genetic Algorithms for Texts Classification Problems

The avalanche quantity of the information developed by mankind has led to concept of automation of knowledge extraction - Data Mining ([1]). This direction is connected with a wide spectrum of problems - from recognition of the fuzzy set to…

Machine Learning · Computer Science 2009-06-05 A. A. Shumeyko , S. L. Sotnik

Soft-Label Dataset Distillation and Text Dataset Distillation

Dataset distillation is a method for reducing dataset sizes by learning a small number of synthetic samples containing all the information of a large dataset. This has several benefits like speeding up model training, reducing energy…

Machine Learning · Computer Science 2022-06-10 Ilia Sucholutsky , Matthias Schonlau

Image Distillation for Safe Data Sharing in Histopathology

Histopathology can help clinicians make accurate diagnoses, determine disease prognosis, and plan appropriate treatment strategies. As deep learning techniques prove successful in the medical domain, the primary challenges become limited…

Computer Vision and Pattern Recognition · Computer Science 2024-07-11 Zhe Li , Bernhard Kainz

What Are We Really Measuring? Rethinking Dataset Bias in Web-Scale Natural Image Collections via Unsupervised Semantic Clustering

In computer vision, a prevailing method for quantifying dataset bias is to train a model to distinguish between datasets. High classification accuracy is then interpreted as evidence of meaningful semantic differences. This approach assumes…

Computer Vision and Pattern Recognition · Computer Science 2026-04-16 Amir Hossein Saleknia , Mohammad Sabokrou

Sequential Subset Matching for Dataset Distillation

Dataset distillation is a newly emerging task that synthesizes a small-size dataset used in training deep neural networks (DNNs) for reducing data storage and model training costs. The synthetic datasets are expected to capture the essence…

Computer Vision and Pattern Recognition · Computer Science 2023-11-06 Jiawei Du , Qin Shi , Joey Tianyi Zhou

Dataset Distillation for Pre-Trained Self-Supervised Vision Models

The task of dataset distillation aims to find a small set of synthetic images such that training a model on them reproduces the performance of the same model trained on a much larger dataset of real samples. Existing distillation methods…

Computer Vision and Pattern Recognition · Computer Science 2025-11-21 George Cazenavette , Antonio Torralba , Vincent Sitzmann

Distill: Domain-Specific Compilation for Cognitive Models

This paper discusses our proposal and implementation of Distill, a domain-specific compilation tool based on LLVM to accelerate cognitive models. Cognitive models explain the process of cognitive function and offer a path to human-like…

Programming Languages · Computer Science 2022-01-17 Jan Vesely , Raghavendra Pradyumna Pothukuchi , Ketaki Joshi , Samyak Gupta , Jonathan D. Cohen , Abhishek Bhattacharjee

EDITS: Enhancing Dataset Distillation with Implicit Textual Semantics

Dataset distillation aims to synthesize a compact dataset from the original large-scale one, enabling highly efficient learning while preserving competitive model performance. However, traditional techniques primarily capture low-level…

Computer Vision and Pattern Recognition · Computer Science 2026-05-14 Qianxin Xia , Jiawei Du , Guoming Lu , Zhiyong Shu , Jielei Wang

Deep Descriptive Clustering

Recent work on explainable clustering allows describing clusters when the features are interpretable. However, much modern machine learning focuses on complex data such as images, text, and graphs where deep learning is used but the raw…

Machine Learning · Computer Science 2021-05-26 Hongjing Zhang , Ian Davidson

Dataset Distillation Efficiently Encodes Low-Dimensional Representations from Gradient-Based Learning of Non-Linear Tasks

Dataset distillation, a training-aware data compression technique, has recently attracted increasing attention as an effective tool for mitigating costs of optimization and data storage. However, progress remains largely empirical.…

Machine Learning · Computer Science 2026-03-31 Yuri Kinoshita , Naoki Nishikawa , Taro Toyoizumi