Related papers: Extracting alignment data in open models

Dataset Distillation

Model distillation aims to distill the knowledge of a complex model into a simpler one. In this paper, we consider an alternative formulation called dataset distillation: we keep the model fixed and instead attempt to distill the knowledge…

Machine Learning · Computer Science 2020-02-26 Tongzhou Wang , Jun-Yan Zhu , Antonio Torralba , Alexei A. Efros

Data Distillation for Text Classification

Deep learning techniques have achieved great success in many fields, while at the same time deep learning models are getting more complex and expensive to compute. It severely hinders the wide applications of these models. In order to…

Computation and Language · Computer Science 2021-04-20 Yongqi Li , Wenjie Li

Distilling Relation Embeddings from Pre-trained Language Models

Pre-trained language models have been found to capture a surprisingly rich amount of lexical knowledge, ranging from commonsense properties of everyday concepts to detailed factual knowledge about named entities. Among others, this makes it…

Computation and Language · Computer Science 2022-09-12 Asahi Ushio , Jose Camacho-Collados , Steven Schockaert

Sentence Embedding Alignment for Lifelong Relation Extraction

Conventional approaches to relation extraction usually require a fixed set of pre-defined relations. Such requirement is hard to meet in many real applications, especially when new data and relations are emerging incessantly and it is…

Computation and Language · Computer Science 2019-03-27 Hong Wang , Wenhan Xiong , Mo Yu , Xiaoxiao Guo , Shiyu Chang , William Yang Wang

Scalable Extraction of Training Data from (Production) Language Models

This paper studies extractable memorization: training data that an adversary can efficiently extract by querying a machine learning model without prior knowledge of the training dataset. We show an adversary can extract gigabytes of…

Machine Learning · Computer Science 2023-11-29 Milad Nasr , Nicholas Carlini , Jonathan Hayase , Matthew Jagielski , A. Feder Cooper , Daphne Ippolito , Christopher A. Choquette-Choo , Eric Wallace , Florian Tramèr , Katherine Lee

Dataset Distillation by Matching Training Trajectories

Dataset distillation is the task of synthesizing a small dataset such that a model trained on the synthetic set will match the test accuracy of the model trained on the full dataset. In this paper, we propose a new formulation that…

Computer Vision and Pattern Recognition · Computer Science 2022-03-23 George Cazenavette , Tongzhou Wang , Antonio Torralba , Alexei A. Efros , Jun-Yan Zhu

Learning to Generate Synthetic Training Data using Gradient Matching and Implicit Differentiation

Using huge training datasets can be costly and inconvenient. This article explores various data distillation techniques that can reduce the amount of data required to successfully train deep networks. Inspired by recent ideas, we suggest…

Machine Learning · Computer Science 2022-03-17 Dmitry Medvedev , Alexander D'yakonov

Large scale distributed neural network training through online distillation

Techniques such as ensembling and distillation promise model quality improvements when paired with almost any base model. However, due to increased test-time cost (for ensembles) and increased complexity of the training pipeline (for…

Machine Learning · Computer Science 2020-08-24 Rohan Anil , Gabriel Pereyra , Alexandre Passos , Robert Ormandi , George E. Dahl , Geoffrey E. Hinton

Scalable Data Ablation Approximations for Language Models through Modular Training and Merging

Training data compositions for Large Language Models (LLMs) can significantly affect their downstream performance. However, a thorough data ablation study exploring large sets of candidate data mixtures is typically prohibitively expensive…

Computation and Language · Computer Science 2024-12-10 Clara Na , Ian Magnusson , Ananya Harsh Jha , Tom Sherborne , Emma Strubell , Jesse Dodge , Pradeep Dasigi

What is Dataset Distillation Learning?

Dataset distillation has emerged as a strategy to overcome the hurdles associated with large datasets by learning a compact set of synthetic data that retains essential information from the original dataset. While distilled data can be used…

Machine Learning · Computer Science 2024-07-23 William Yang , Ye Zhu , Zhiwei Deng , Olga Russakovsky

A Closer Look at Codistillation for Distributed Training

Codistillation has been proposed as a mechanism to share knowledge among concurrently trained models by encouraging them to represent the same function through an auxiliary loss. This contrasts with the more commonly used fully-synchronous…

Machine Learning · Computer Science 2021-07-27 Shagun Sodhani , Olivier Delalleau , Mahmoud Assran , Koustuv Sinha , Nicolas Ballas , Michael Rabbat

Sampling and Filtering of Neural Machine Translation Distillation Data

In most of neural machine translation distillation or stealing scenarios, the goal is to preserve the performance of the target model (teacher). The highest-scoring hypothesis of the teacher model is commonly used to train a new model…

Computation and Language · Computer Science 2021-04-02 Vilém Zouhar

Self-Distillation Amplifies Regularization in Hilbert Space

Knowledge distillation introduced in the deep learning context is a method to transfer knowledge from one architecture to another. In particular, when the architectures are identical, this is called self-distillation. The idea is to feed in…

Machine Learning · Computer Science 2020-10-27 Hossein Mobahi , Mehrdad Farajtabar , Peter L. Bartlett

Revealing the Power of Post-Training for Small Language Models via Knowledge Distillation

The rapid advancement of large language models (LLMs) has significantly advanced the capabilities of artificial intelligence across various domains. However, their massive scale and high computational costs render them unsuitable for direct…

Computer Vision and Pattern Recognition · Computer Science 2025-10-01 Miao Rang , Zhenni Bi , Hang Zhou , Hanting Chen , An Xiao , Tianyu Guo , Kai Han , Xinghao Chen , Yunhe Wang

Dataset distillation for memorized data: Soft labels can leak held-out teacher knowledge

Dataset distillation aims to compress training data into fewer examples via a teacher, from which a student can learn effectively. While its success is often attributed to structure in the data, modern neural networks also memorize specific…

Machine Learning · Computer Science 2026-02-23 Freya Behrens , Lenka Zdeborová

Knowledge Distillation in Deep Learning and its Applications

Deep learning based models are relatively large, and it is hard to deploy such models on resource-limited devices such as mobile phones and embedded devices. One possible solution is knowledge distillation whereby a smaller model (student…

Machine Learning · Computer Science 2021-05-21 Abdolmaged Alkhulaifi , Fahad Alsahli , Irfan Ahmad

Data-Efficient Ranking Distillation for Image Retrieval

Recent advances in deep learning has lead to rapid developments in the field of image retrieval. However, the best performing architectures incur significant computational cost. Recent approaches tackle this issue using knowledge…

Computer Vision and Pattern Recognition · Computer Science 2020-07-14 Zakaria Laskar , Juho Kannala

Towards a theory of model distillation

Distillation is the task of replacing a complicated machine learning model with a simpler model that approximates the original [BCNM06,HVD15]. Despite many practical applications, basic questions about the extent to which models can be…

Machine Learning · Computer Science 2024-05-07 Enric Boix-Adsera

Dataset Distillation for Pre-Trained Self-Supervised Vision Models

The task of dataset distillation aims to find a small set of synthetic images such that training a model on them reproduces the performance of the same model trained on a much larger dataset of real samples. Existing distillation methods…

Computer Vision and Pattern Recognition · Computer Science 2025-11-21 George Cazenavette , Antonio Torralba , Vincent Sitzmann

Dataset Distillation Meets Provable Subset Selection

Deep learning has grown tremendously over recent years, yielding state-of-the-art results in various fields. However, training such models requires huge amounts of data, increasing the computational time and cost. To address this, dataset…

Machine Learning · Computer Science 2023-07-18 Murad Tukan , Alaa Maalouf , Margarita Osadchy