Related papers: Small Scale Data-Free Knowledge Distillation

Data-Free Knowledge Distillation with Soft Targeted Transfer Set Synthesis

Knowledge distillation (KD) has proved to be an effective approach for deep neural network compression, which learns a compact network (student) by transferring the knowledge from a pre-trained, over-parameterized network (teacher). In…

Machine Learning · Computer Science 2021-04-13 Zi Wang

Data Efficient Stagewise Knowledge Distillation

Despite the success of Deep Learning (DL), the deployment of modern DL models requiring large computational power poses a significant problem for resource-constrained systems. This necessitates building compact networks that reduce…

Machine Learning · Computer Science 2020-06-24 Akshay Kulkarni , Navid Panchi , Sharath Chandra Raparthy , Shital Chiddarwar

Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling

Recent advances in knowledge distillation (KD) have enabled smaller student models to approach the performance of larger teacher models. However, popular methods such as supervised KD and on-policy KD, are adversely impacted by the…

Computation and Language · Computer Science 2025-04-29 Wenda Xu , Rujun Han , Zifeng Wang , Long T. Le , Dhruv Madeka , Lei Li , William Yang Wang , Rishabh Agarwal , Chen-Yu Lee , Tomas Pfister

An Embarrassingly Simple Approach for Knowledge Distillation

Knowledge Distillation (KD) aims at improving the performance of a low-capacity student model by inheriting knowledge from a high-capacity teacher model. Previous KD methods typically train a student by minimizing a task-related loss and…

Computer Vision and Pattern Recognition · Computer Science 2019-09-10 Mengya Gao , Yujun Shen , Quanquan Li , Junjie Yan , Liang Wan , Dahua Lin , Chen Change Loy , Xiaoou Tang

Zero-Shot Knowledge Distillation in Deep Networks

Knowledge distillation deals with the problem of training a smaller model (Student) from a high capacity source model (Teacher) so as to retain most of its performance. Existing approaches use either the training data or meta-data extracted…

Machine Learning · Computer Science 2019-05-21 Gaurav Kumar Nayak , Konda Reddy Mopuri , Vaisakh Shaj , R. Venkatesh Babu , Anirban Chakraborty

Hybrid Data-Free Knowledge Distillation

Data-free knowledge distillation aims to learn a compact student network from a pre-trained large teacher network without using the original training data of the teacher network. Existing collection-based and generation-based methods train…

Computer Vision and Pattern Recognition · Computer Science 2024-12-19 Jialiang Tang , Shuo Chen , Chen Gong

Robust and Resource-Efficient Data-Free Knowledge Distillation by Generative Pseudo Replay

Data-Free Knowledge Distillation (KD) allows knowledge transfer from a trained neural network (teacher) to a more compact one (student) in the absence of original training data. Existing works use a validation set to monitor the accuracy of…

Machine Learning · Computer Science 2024-07-30 Kuluhan Binici , Shivam Aggarwal , Nam Trung Pham , Karianto Leman , Tulika Mitra

Synthetic data generation method for data-free knowledge distillation in regression neural networks

Knowledge distillation is the technique of compressing a larger neural network, known as the teacher, into a smaller neural network, known as the student, while still trying to maintain the performance of the larger neural network as much…

Machine Learning · Computer Science 2023-05-11 Tianxun Zhou , Keng-Hwee Chiam

Dynamic Rectification Knowledge Distillation

Knowledge Distillation is a technique which aims to utilize dark knowledge to compress and transfer information from a vast, well-trained neural network (teacher model) to a smaller, less capable neural network (student model) with improved…

Computer Vision and Pattern Recognition · Computer Science 2022-01-28 Fahad Rahman Amik , Ahnaf Ismat Tasin , Silvia Ahmed , M. M. Lutfe Elahi , Nabeel Mohammed

Black-box Few-shot Knowledge Distillation

Knowledge distillation (KD) is an efficient approach to transfer the knowledge from a large "teacher" network to a smaller "student" network. Traditional KD methods require lots of labeled training samples and a white-box teacher…

Computer Vision and Pattern Recognition · Computer Science 2022-07-26 Dang Nguyen , Sunil Gupta , Kien Do , Svetha Venkatesh

Data-Free Adversarial Distillation

Knowledge Distillation (KD) has made remarkable progress in the last few years and become a popular paradigm for model compression and knowledge transfer. However, almost all existing KD algorithms are data-driven, i.e., relying on a large…

Machine Learning · Computer Science 2020-03-03 Gongfan Fang , Jie Song , Chengchao Shen , Xinchao Wang , Da Chen , Mingli Song

Extracting knowledge from features with multilevel abstraction

Knowledge distillation aims at transferring the knowledge from a large teacher model to a small student model with great improvements of the performance of the student model. Therefore, the student network can replace the teacher network to…

Machine Learning · Computer Science 2021-12-28 Jinhong Lin , Zhaoyang Li

Lightweight Self-Knowledge Distillation with Multi-source Information Fusion

Knowledge Distillation (KD) is a powerful technique for transferring knowledge between neural network models, where a pre-trained teacher model is used to facilitate the training of the target student model. However, the availability of a…

Computer Vision and Pattern Recognition · Computer Science 2023-05-17 Xucong Wang , Pengchao Han , Lei Guo

Dataset Distillation via Knowledge Distillation: Towards Efficient Self-Supervised Pre-Training of Deep Networks

Dataset distillation (DD) generates small synthetic datasets that can efficiently train deep networks with a limited amount of memory and compute. Despite the success of DD methods for supervised learning, DD for self-supervised…

Machine Learning · Computer Science 2025-04-02 Siddharth Joshi , Jiayi Ni , Baharan Mirzasoleiman

Knowledge Distillation with Deep Supervision

Knowledge distillation aims to enhance the performance of a lightweight student model by exploiting the knowledge from a pre-trained cumbersome teacher model. However, in the traditional knowledge distillation, teacher predictions are only…

Machine Learning · Computer Science 2023-05-26 Shiya Luo , Defang Chen , Can Wang

Learning from a Lightweight Teacher for Efficient Knowledge Distillation

Knowledge Distillation (KD) is an effective framework for compressing deep learning models, realized by a student-teacher paradigm requiring small student networks to mimic the soft target generated by well-trained teachers. However, the…

Computer Vision and Pattern Recognition · Computer Science 2020-05-20 Yuang Liu , Wei Zhang , Jun Wang

On the Efficiency of Subclass Knowledge Distillation in Classification Tasks

This work introduces a novel knowledge distillation framework for classification tasks where information on existing subclasses is available and taken into consideration. In classification tasks with a small number of classes or binary…

Machine Learning · Computer Science 2022-07-06 Ahmad Sajedi , Konstantinos N. Plataniotis

Condensed Data Expansion Using Model Inversion for Knowledge Distillation

Condensed datasets offer a compact representation of larger datasets, but training models directly on them or using them to enhance model performance through knowledge distillation (KD) can result in suboptimal outcomes due to limited…

Machine Learning · Computer Science 2025-11-11 Kuluhan Binici , Shivam Aggarwal , Cihan Acar , Nam Trung Pham , Karianto Leman , Gim Hee Lee , Tulika Mitra

Distilling the Knowledge in Data Pruning

With the increasing size of datasets used for training neural networks, data pruning becomes an attractive field of research. However, most current data pruning algorithms are limited in their ability to preserve accuracy compared to models…

Computer Vision and Pattern Recognition · Computer Science 2024-08-15 Emanuel Ben-Baruch , Adam Botach , Igor Kviatkovsky , Manoj Aggarwal , Gérard Medioni

Distribution Shift Matters for Knowledge Distillation with Webly Collected Images

Knowledge distillation aims to learn a lightweight student network from a pre-trained teacher network. In practice, existing knowledge distillation methods are usually infeasible when the original training data is unavailable due to some…

Computer Vision and Pattern Recognition · Computer Science 2023-07-24 Jialiang Tang , Shuo Chen , Gang Niu , Masashi Sugiyama , Chen Gong