Related papers: Distilling Knowledge via Intermediate Classifiers

Knowledge Distillation with the Reused Teacher Classifier

Knowledge distillation aims to compress a powerful yet cumbersome teacher model into a lightweight student model without much sacrifice of performance. For this purpose, various approaches have been proposed over the past few years,…

Computer Vision and Pattern Recognition · Computer Science 2022-03-29 Defang Chen , Jian-Ping Mei , Hailin Zhang , Can Wang , Yan Feng , Chun Chen

Improved Knowledge Distillation for Pre-trained Language Models via Knowledge Selection

Knowledge distillation addresses the problem of transferring knowledge from a teacher model to a student model. In this process, we typically have multiple types of knowledge extracted from the teacher model. The problem is to make full use…

Computation and Language · Computer Science 2023-02-02 Chenglong Wang , Yi Lu , Yongyu Mu , Yimin Hu , Tong Xiao , Jingbo Zhu

Student Network Learning via Evolutionary Knowledge Distillation

Knowledge distillation provides an effective way to transfer knowledge via teacher-student learning, where most existing distillation approaches apply a fixed pre-trained model as teacher to supervise the learning of student network. This…

Machine Learning · Computer Science 2021-03-26 Kangkai Zhang , Chunhui Zhang , Shikun Li , Dan Zeng , Shiming Ge

Efficient Knowledge Distillation from Model Checkpoints

Knowledge distillation is an effective approach to learn compact models (students) with the supervision of large and strong models (teachers). As empirically there exists a strong correlation between the performance of teacher and student…

Machine Learning · Computer Science 2022-10-13 Chaofei Wang , Qisen Yang , Rui Huang , Shiji Song , Gao Huang

Improved Knowledge Distillation via Teacher Assistant

Despite the fact that deep neural networks are powerful models and achieve appealing results on many tasks, they are too large to be deployed on edge devices like smartphones or embedded sensor nodes. There have been efforts to compress…

Machine Learning · Computer Science 2019-12-18 Seyed-Iman Mirzadeh , Mehrdad Farajtabar , Ang Li , Nir Levine , Akihiro Matsukawa , Hassan Ghasemzadeh

Distilling Image Classifiers in Object Detectors

Knowledge distillation constitutes a simple yet effective way to improve the performance of a compact student network by exploiting the knowledge of a more powerful teacher. Nevertheless, the knowledge distillation literature remains…

Computer Vision and Pattern Recognition · Computer Science 2022-02-11 Shuxuan Guo , Jose M. Alvarez , Mathieu Salzmann

What Knowledge Gets Distilled in Knowledge Distillation?

Knowledge distillation aims to transfer useful information from a teacher network to a student network, with the primary goal of improving the student's performance for the task at hand. Over the years, there has a been a deluge of novel…

Computer Vision and Pattern Recognition · Computer Science 2023-11-07 Utkarsh Ojha , Yuheng Li , Anirudh Sundara Rajan , Yingyu Liang , Yong Jae Lee

Knowledge Distillation via Weighted Ensemble of Teaching Assistants

Knowledge distillation in machine learning is the process of transferring knowledge from a large model called the teacher to a smaller model called the student. Knowledge distillation is one of the techniques to compress the large network…

Machine Learning · Computer Science 2022-06-27 Durga Prasad Ganta , Himel Das Gupta , Victor S. Sheng

Knowledge distillation is a widely applicable technique for training a student neural network under the guidance of a trained teacher network. For example, in neural network compression, a high-capacity teacher is distilled to train a…

Computer Vision and Pattern Recognition · Computer Science 2019-08-05 Frederick Tung , Greg Mori

Cooperative Knowledge Distillation: A Learner Agnostic Approach

Knowledge distillation is a simple but powerful way to transfer knowledge between a teacher model to a student model. Existing work suffers from at least one of the following key limitations in terms of direction and scope of transfer which…

Machine Learning · Computer Science 2024-02-12 Michael Livanos , Ian Davidson , Stephen Wong

Efficient Knowledge Distillation via Curriculum Extraction

Knowledge distillation is a technique used to train a small student network using the output generated by a large teacher network, and has many empirical advantages~\citep{Hinton2015DistillingTK}. While the standard one-shot approach to…

Machine Learning · Computer Science 2025-03-25 Shivam Gupta , Sushrut Karmalkar

It's All in the Head: Representation Knowledge Distillation through Classifier Sharing

Representation knowledge distillation aims at transferring rich information from one model to another. Common approaches for representation distillation mainly focus on the direct minimization of distance metrics between the models'…

Computer Vision and Pattern Recognition · Computer Science 2022-04-06 Emanuel Ben-Baruch , Matan Karklinsky , Yossi Biton , Avi Ben-Cohen , Hussam Lawen , Nadav Zamir

Knowledge Distillation Layer that Lets the Student Decide

Typical technique in knowledge distillation (KD) is regularizing the learning of a limited capacity model (student) by pushing its responses to match a powerful model's (teacher). Albeit useful especially in the penultimate layer and…

Computer Vision and Pattern Recognition · Computer Science 2023-09-08 Ada Gorgun , Yeti Z. Gurbuz , A. Aydin Alatan

Extracurricular Learning: Knowledge Transfer Beyond Empirical Distribution

Knowledge distillation has been used to transfer knowledge learned by a sophisticated model (teacher) to a simpler model (student). This technique is widely used to compress model complexity. However, in most applications the compressed…

Machine Learning · Computer Science 2020-11-24 Hadi Pouransari , Mojan Javaheripi , Vinay Sharma , Oncel Tuzel

Reducing the Teacher-Student Gap via Spherical Knowledge Disitllation

Knowledge distillation aims at obtaining a compact and effective model by learning the mapping function from a much larger one. Due to the limited capacity of the student, the student would underfit the teacher. Therefore, student…

Machine Learning · Computer Science 2021-01-13 Jia Guo , Minghao Chen , Yao Hu , Chen Zhu , Xiaofei He , Deng Cai

A Survey on Recent Teacher-student Learning Studies

Knowledge distillation is a method of transferring the knowledge from a complex deep neural network (DNN) to a smaller and faster DNN, while preserving its accuracy. Recent variants of knowledge distillation include teaching assistant…

Machine Learning · Computer Science 2023-04-11 Minghong Gao

Distilling Calibrated Student from an Uncalibrated Teacher

Knowledge distillation is a common technique for improving the performance of a shallow student network by transferring information from a teacher network, which in general, is comparatively large and deep. These teacher networks are…

Computer Vision and Pattern Recognition · Computer Science 2023-02-23 Ishan Mishra , Sethu Vamsi Krishna , Deepak Mishra

Fixing the Teacher-Student Knowledge Discrepancy in Distillation

Training a small student network with the guidance of a larger teacher network is an effective way to promote the performance of the student. Despite the different types, the guided knowledge used to distill is always kept unchanged for…

Computer Vision and Pattern Recognition · Computer Science 2021-04-01 Jiangfan Han , Mengya Gao , Yujie Wang , Quanquan Li , Hongsheng Li , Xiaogang Wang

Does Knowledge Distillation Really Work?

Knowledge distillation is a popular technique for training a small student network to emulate a larger teacher model, such as an ensemble of networks. We show that while knowledge distillation can improve student generalization, it does not…

Machine Learning · Computer Science 2021-12-07 Samuel Stanton , Pavel Izmailov , Polina Kirichenko , Alexander A. Alemi , Andrew Gordon Wilson

Improving Knowledge Distillation via Transferring Learning Ability

Existing knowledge distillation methods generally use a teacher-student approach, where the student network solely learns from a well-trained teacher. However, this approach overlooks the inherent differences in learning abilities between…

Computer Vision and Pattern Recognition · Computer Science 2023-09-19 Long Liu , Tong Li , Hui Cheng