English
Related papers

Related papers: Extracting alignment data in open models

200 papers

Model distillation aims to distill the knowledge of a complex model into a simpler one. In this paper, we consider an alternative formulation called dataset distillation: we keep the model fixed and instead attempt to distill the knowledge…

Machine Learning · Computer Science 2020-02-26 Tongzhou Wang , Jun-Yan Zhu , Antonio Torralba , Alexei A. Efros

Deep learning techniques have achieved great success in many fields, while at the same time deep learning models are getting more complex and expensive to compute. It severely hinders the wide applications of these models. In order to…

Computation and Language · Computer Science 2021-04-20 Yongqi Li , Wenjie Li

Pre-trained language models have been found to capture a surprisingly rich amount of lexical knowledge, ranging from commonsense properties of everyday concepts to detailed factual knowledge about named entities. Among others, this makes it…

Computation and Language · Computer Science 2022-09-12 Asahi Ushio , Jose Camacho-Collados , Steven Schockaert

Conventional approaches to relation extraction usually require a fixed set of pre-defined relations. Such requirement is hard to meet in many real applications, especially when new data and relations are emerging incessantly and it is…

Computation and Language · Computer Science 2019-03-27 Hong Wang , Wenhan Xiong , Mo Yu , Xiaoxiao Guo , Shiyu Chang , William Yang Wang

This paper studies extractable memorization: training data that an adversary can efficiently extract by querying a machine learning model without prior knowledge of the training dataset. We show an adversary can extract gigabytes of…

Dataset distillation is the task of synthesizing a small dataset such that a model trained on the synthetic set will match the test accuracy of the model trained on the full dataset. In this paper, we propose a new formulation that…

Computer Vision and Pattern Recognition · Computer Science 2022-03-23 George Cazenavette , Tongzhou Wang , Antonio Torralba , Alexei A. Efros , Jun-Yan Zhu

Using huge training datasets can be costly and inconvenient. This article explores various data distillation techniques that can reduce the amount of data required to successfully train deep networks. Inspired by recent ideas, we suggest…

Machine Learning · Computer Science 2022-03-17 Dmitry Medvedev , Alexander D'yakonov

Techniques such as ensembling and distillation promise model quality improvements when paired with almost any base model. However, due to increased test-time cost (for ensembles) and increased complexity of the training pipeline (for…

Machine Learning · Computer Science 2020-08-24 Rohan Anil , Gabriel Pereyra , Alexandre Passos , Robert Ormandi , George E. Dahl , Geoffrey E. Hinton

Training data compositions for Large Language Models (LLMs) can significantly affect their downstream performance. However, a thorough data ablation study exploring large sets of candidate data mixtures is typically prohibitively expensive…

Computation and Language · Computer Science 2024-12-10 Clara Na , Ian Magnusson , Ananya Harsh Jha , Tom Sherborne , Emma Strubell , Jesse Dodge , Pradeep Dasigi

Dataset distillation has emerged as a strategy to overcome the hurdles associated with large datasets by learning a compact set of synthetic data that retains essential information from the original dataset. While distilled data can be used…

Machine Learning · Computer Science 2024-07-23 William Yang , Ye Zhu , Zhiwei Deng , Olga Russakovsky

Codistillation has been proposed as a mechanism to share knowledge among concurrently trained models by encouraging them to represent the same function through an auxiliary loss. This contrasts with the more commonly used fully-synchronous…

Machine Learning · Computer Science 2021-07-27 Shagun Sodhani , Olivier Delalleau , Mahmoud Assran , Koustuv Sinha , Nicolas Ballas , Michael Rabbat

In most of neural machine translation distillation or stealing scenarios, the goal is to preserve the performance of the target model (teacher). The highest-scoring hypothesis of the teacher model is commonly used to train a new model…

Computation and Language · Computer Science 2021-04-02 Vilém Zouhar

Knowledge distillation introduced in the deep learning context is a method to transfer knowledge from one architecture to another. In particular, when the architectures are identical, this is called self-distillation. The idea is to feed in…

Machine Learning · Computer Science 2020-10-27 Hossein Mobahi , Mehrdad Farajtabar , Peter L. Bartlett

The rapid advancement of large language models (LLMs) has significantly advanced the capabilities of artificial intelligence across various domains. However, their massive scale and high computational costs render them unsuitable for direct…

Computer Vision and Pattern Recognition · Computer Science 2025-10-01 Miao Rang , Zhenni Bi , Hang Zhou , Hanting Chen , An Xiao , Tianyu Guo , Kai Han , Xinghao Chen , Yunhe Wang

Dataset distillation aims to compress training data into fewer examples via a teacher, from which a student can learn effectively. While its success is often attributed to structure in the data, modern neural networks also memorize specific…

Machine Learning · Computer Science 2026-02-23 Freya Behrens , Lenka Zdeborová

Deep learning based models are relatively large, and it is hard to deploy such models on resource-limited devices such as mobile phones and embedded devices. One possible solution is knowledge distillation whereby a smaller model (student…

Machine Learning · Computer Science 2021-05-21 Abdolmaged Alkhulaifi , Fahad Alsahli , Irfan Ahmad

Recent advances in deep learning has lead to rapid developments in the field of image retrieval. However, the best performing architectures incur significant computational cost. Recent approaches tackle this issue using knowledge…

Computer Vision and Pattern Recognition · Computer Science 2020-07-14 Zakaria Laskar , Juho Kannala

Distillation is the task of replacing a complicated machine learning model with a simpler model that approximates the original [BCNM06,HVD15]. Despite many practical applications, basic questions about the extent to which models can be…

Machine Learning · Computer Science 2024-05-07 Enric Boix-Adsera

The task of dataset distillation aims to find a small set of synthetic images such that training a model on them reproduces the performance of the same model trained on a much larger dataset of real samples. Existing distillation methods…

Computer Vision and Pattern Recognition · Computer Science 2025-11-21 George Cazenavette , Antonio Torralba , Vincent Sitzmann

Deep learning has grown tremendously over recent years, yielding state-of-the-art results in various fields. However, training such models requires huge amounts of data, increasing the computational time and cost. To address this, dataset…

Machine Learning · Computer Science 2023-07-18 Murad Tukan , Alaa Maalouf , Margarita Osadchy
‹ Prev 1 2 3 10 Next ›