Related papers: Knowledge Distillation via Instance-level Sequence…

Densely Guided Knowledge Distillation using Multiple Teacher Assistants

With the success of deep neural networks, knowledge distillation which guides the learning of a small student network from a large teacher network is being actively studied for model compression and transfer learning. However, few studies…

Computer Vision and Pattern Recognition · Computer Science 2021-08-10 Wonchul Son , Jaemin Na , Junyong Choi , Wonjun Hwang

Student Network Learning via Evolutionary Knowledge Distillation

Knowledge distillation provides an effective way to transfer knowledge via teacher-student learning, where most existing distillation approaches apply a fixed pre-trained model as teacher to supervise the learning of student network. This…

Machine Learning · Computer Science 2021-03-26 Kangkai Zhang , Chunhui Zhang , Shikun Li , Dan Zeng , Shiming Ge

Efficient Knowledge Distillation via Curriculum Extraction

Knowledge distillation is a technique used to train a small student network using the output generated by a large teacher network, and has many empirical advantages~\citep{Hinton2015DistillingTK}. While the standard one-shot approach to…

Machine Learning · Computer Science 2025-03-25 Shivam Gupta , Sushrut Karmalkar

Recurrent knowledge distillation

Knowledge distillation compacts deep networks by letting a small student network learn from a large teacher network. The accuracy of knowledge distillation recently benefited from adding residual layers. We propose to reduce the size of the…

Computer Vision and Pattern Recognition · Computer Science 2018-05-21 Silvia L. Pintea , Yue Liu , Jan C. van Gemert

Distilling Knowledge via Knowledge Review

Knowledge distillation transfers knowledge from the teacher network to the student one, with the goal of greatly improving the performance of the student network. Previous methods mostly focus on proposing feature transformation and loss…

Computer Vision and Pattern Recognition · Computer Science 2021-04-20 Pengguang Chen , Shu Liu , Hengshuang Zhao , Jiaya Jia

Learning Student-Friendly Teacher Networks for Knowledge Distillation

We propose a novel knowledge distillation approach to facilitate the transfer of dark knowledge from a teacher to a student. Contrary to most of the existing methods that rely on effective training of student models given pretrained…

Machine Learning · Computer Science 2022-01-25 Dae Young Park , Moon-Hyun Cha , Changwook Jeong , Dae Sin Kim , Bohyung Han

Distilling Knowledge via Intermediate Classifiers

The crux of knowledge distillation is to effectively train a resource-limited student model with the guide of a pre-trained larger teacher model. However, when there is a large difference between the model complexities of teacher and…

Machine Learning · Computer Science 2021-06-01 Aryan Asadian , Amirali Salehi-Abari

Distilling Calibrated Student from an Uncalibrated Teacher

Knowledge distillation is a common technique for improving the performance of a shallow student network by transferring information from a teacher network, which in general, is comparatively large and deep. These teacher networks are…

Computer Vision and Pattern Recognition · Computer Science 2023-02-23 Ishan Mishra , Sethu Vamsi Krishna , Deepak Mishra

Fixing the Teacher-Student Knowledge Discrepancy in Distillation

Training a small student network with the guidance of a larger teacher network is an effective way to promote the performance of the student. Despite the different types, the guided knowledge used to distill is always kept unchanged for…

Computer Vision and Pattern Recognition · Computer Science 2021-04-01 Jiangfan Han , Mengya Gao , Yujie Wang , Quanquan Li , Hongsheng Li , Xiaogang Wang

Knowledge distillation is a widely applicable technique for training a student neural network under the guidance of a trained teacher network. For example, in neural network compression, a high-capacity teacher is distilled to train a…

Computer Vision and Pattern Recognition · Computer Science 2019-08-05 Frederick Tung , Greg Mori

Student Helping Teacher: Teacher Evolution via Self-Knowledge Distillation

Knowledge distillation usually transfers the knowledge from a pre-trained cumbersome teacher network to a compact student network, which follows the classical teacher-teaching-student paradigm. Based on this paradigm, previous methods…

Computer Vision and Pattern Recognition · Computer Science 2021-10-14 Zheng Li , Xiang Li , Lingfeng Yang , Jian Yang , Zhigeng Pan

Collaborative Multi-Teacher Knowledge Distillation for Learning Low Bit-width Deep Neural Networks

Knowledge distillation which learns a lightweight student model by distilling knowledge from a cumbersome teacher model is an attractive approach for learning compact deep neural networks (DNNs). Recent works further improve student network…

Computer Vision and Pattern Recognition · Computer Science 2022-10-31 Cuong Pham , Tuan Hoang , Thanh-Toan Do

Improved Knowledge Distillation for Pre-trained Language Models via Knowledge Selection

Knowledge distillation addresses the problem of transferring knowledge from a teacher model to a student model. In this process, we typically have multiple types of knowledge extracted from the teacher model. The problem is to make full use…

Computation and Language · Computer Science 2023-02-02 Chenglong Wang , Yi Lu , Yongyu Mu , Yimin Hu , Tong Xiao , Jingbo Zhu

Variational Information Distillation for Knowledge Transfer

Transferring knowledge from a teacher neural network pretrained on the same or a similar task to a student neural network can significantly improve the performance of the student neural network. Existing knowledge transfer approaches match…

Computer Vision and Pattern Recognition · Computer Science 2019-04-12 Sungsoo Ahn , Shell Xu Hu , Andreas Damianou , Neil D. Lawrence , Zhenwen Dai

Improved Knowledge Distillation via Teacher Assistant

Despite the fact that deep neural networks are powerful models and achieve appealing results on many tasks, they are too large to be deployed on edge devices like smartphones or embedded sensor nodes. There have been efforts to compress…

Machine Learning · Computer Science 2019-12-18 Seyed-Iman Mirzadeh , Mehrdad Farajtabar , Ang Li , Nir Levine , Akihiro Matsukawa , Hassan Ghasemzadeh

Extracting knowledge from features with multilevel abstraction

Knowledge distillation aims at transferring the knowledge from a large teacher model to a small student model with great improvements of the performance of the student model. Therefore, the student network can replace the teacher network to…

Machine Learning · Computer Science 2021-12-28 Jinhong Lin , Zhaoyang Li

QUEST: Quantized embedding space for transferring knowledge

Knowledge distillation refers to the process of training a compact student network to achieve better accuracy by learning from a high capacity teacher network. Most of the existing knowledge distillation methods direct the student to follow…

Computer Vision and Pattern Recognition · Computer Science 2020-07-21 Himalaya Jain , Spyros Gidaris , Nikos Komodakis , Patrick Pérez , Matthieu Cord

Sentence-Level or Token-Level? A Comprehensive Study on Knowledge Distillation

Knowledge distillation, transferring knowledge from a teacher model to a student model, has emerged as a powerful technique in neural machine translation for compressing models or simplifying training targets. Knowledge distillation…

Computation and Language · Computer Science 2024-04-24 Jingxuan Wei , Linzhuang Sun , Yichong Leng , Xu Tan , Bihui Yu , Ruifeng Guo

Interactive Knowledge Distillation

Knowledge distillation is a standard teacher-student learning framework to train a light-weight student network under the guidance of a well-trained large teacher network. As an effective teaching strategy, interactive teaching has been…

Computer Vision and Pattern Recognition · Computer Science 2021-04-16 Shipeng Fu , Zhen Li , Jun Xu , Ming-Ming Cheng , Zitao Liu , Xiaomin Yang

A Survey on Recent Teacher-student Learning Studies

Knowledge distillation is a method of transferring the knowledge from a complex deep neural network (DNN) to a smaller and faster DNN, while preserving its accuracy. Recent variants of knowledge distillation include teaching assistant…

Machine Learning · Computer Science 2023-04-11 Minghong Gao