Related papers: Knowledge Distillation Beyond Model Compression

Modeling Teacher-Student Techniques in Deep Neural Networks for Knowledge Distillation

Knowledge distillation (KD) is a new method for transferring knowledge of a structure under training to another one. The typical application of KD is in the form of learning a small model (named as a student) by soft labels produced by a…

Computer Vision and Pattern Recognition · Computer Science 2020-01-01 Sajjad Abbasi , Mohsen Hajabdollahi , Nader Karimi , Shadrokh Samavi

Understanding and Improving Knowledge Distillation

Knowledge Distillation (KD) is a model-agnostic technique to improve model quality while having a fixed capacity budget. It is a commonly used technique for model compression, where a larger capacity teacher model with better quality is…

Machine Learning · Computer Science 2021-03-02 Jiaxi Tang , Rakesh Shivanna , Zhe Zhao , Dong Lin , Anima Singh , Ed H. Chi , Sagar Jain

Heterogeneous Knowledge Distillation using Information Flow Modeling

Knowledge Distillation (KD) methods are capable of transferring the knowledge encoded in a large and complex teacher into a smaller and faster student. Early methods were usually limited to transferring the knowledge only between the last…

Computer Vision and Pattern Recognition · Computer Science 2020-05-05 Nikolaos Passalis , Maria Tzelepi , Anastasios Tefas

Feature Alignment and Representation Transfer in Knowledge Distillation for Large Language Models

Knowledge distillation (KD) is a technique for transferring knowledge from complex teacher models to simpler student models, significantly enhancing model efficiency and accuracy. It has demonstrated substantial advancements in various…

Computation and Language · Computer Science 2025-04-21 Junjie Yang , Junhao Song , Xudong Han , Ziqian Bi , Tianyang Wang , Chia Xin Liang , Xinyuan Song , Yichao Zhang , Qian Niu , Benji Peng , Keyu Chen , Ming Liu

Improved knowledge distillation by utilizing backward pass knowledge in neural networks

Knowledge distillation (KD) is one of the prominent techniques for model compression. In this method, the knowledge of a large network (teacher) is distilled into a model (student) with usually significantly fewer parameters. KD tries to…

Machine Learning · Computer Science 2023-01-31 Aref Jafari , Mehdi Rezagholizadeh , Ali Ghodsi

Can Students Beyond The Teacher? Distilling Knowledge from Teacher's Bias

Knowledge distillation (KD) is a model compression technique that transfers knowledge from a large teacher model to a smaller student model to enhance its performance. Existing methods often assume that the student model is inherently…

Computer Vision and Pattern Recognition · Computer Science 2024-12-16 Jianhua Zhang , Yi Gao , Ruyu Liu , Xu Cheng , Houxiang Zhang , Shengyong Chen

Augmenting Knowledge Distillation With Peer-To-Peer Mutual Learning For Model Compression

Knowledge distillation (KD) is an effective model compression technique where a compact student network is taught to mimic the behavior of a complex and highly trained teacher network. In contrast, Mutual Learning (ML) provides an…

Computer Vision and Pattern Recognition · Computer Science 2021-10-25 Usma Niyaz , Deepti R. Bathula

Residual Knowledge Distillation

Knowledge distillation (KD) is one of the most potent ways for model compression. The key idea is to transfer the knowledge from a deep teacher model (T) to a shallower student (S). However, existing methods suffer from performance…

Machine Learning · Computer Science 2020-02-24 Mengya Gao , Yujun Shen , Quanquan Li , Chen Change Loy

Dynamic Rectification Knowledge Distillation

Knowledge Distillation is a technique which aims to utilize dark knowledge to compress and transfer information from a vast, well-trained neural network (teacher model) to a smaller, less capable neural network (student model) with improved…

Computer Vision and Pattern Recognition · Computer Science 2022-01-28 Fahad Rahman Amik , Ahnaf Ismat Tasin , Silvia Ahmed , M. M. Lutfe Elahi , Nabeel Mohammed

Robust Knowledge Distillation Based on Feature Variance Against Backdoored Teacher Model

Benefiting from well-trained deep neural networks (DNNs), model compression have captured special attention for computing resource limited equipment, especially edge devices. Knowledge distillation (KD) is one of the widely used compression…

Machine Learning · Computer Science 2024-06-06 Jinyin Chen , Xiaoming Zhao , Haibin Zheng , Xiao Li , Sheng Xiang , Haifeng Guo

Knowledge Condensation Distillation

Knowledge Distillation (KD) transfers the knowledge from a high-capacity teacher network to strengthen a smaller student. Existing methods focus on excavating the knowledge hints and transferring the whole knowledge to the student. However,…

Computer Vision and Pattern Recognition · Computer Science 2022-07-13 Chenxin Li , Mingbao Lin , Zhiyuan Ding , Nie Lin , Yihong Zhuang , Yue Huang , Xinghao Ding , Liujuan Cao

Improving Knowledge Distillation with Teacher's Explanation

Knowledge distillation (KD) improves the performance of a low-complexity student model with the help of a more powerful teacher. The teacher in KD is a black-box model, imparting knowledge to the student only through its predictions. This…

Machine Learning · Computer Science 2023-10-05 Sayantan Chowdhury , Ben Liang , Ali Tizghadam , Ilijc Albanese

Learning Interpretation with Explainable Knowledge Distillation

Knowledge Distillation (KD) has been considered as a key solution in model compression and acceleration in recent years. In KD, a small student model is generally trained from a large teacher model by minimizing the divergence between the…

Machine Learning · Computer Science 2021-11-16 Raed Alharbi , Minh N. Vu , My T. Thai

The Role of Teacher Calibration in Knowledge Distillation

Knowledge Distillation (KD) has emerged as an effective model compression technique in deep learning, enabling the transfer of knowledge from a large teacher model to a compact student model. While KD has demonstrated significant success,…

Machine Learning · Computer Science 2025-08-29 Suyoung Kim , Seonguk Park , Junhoo Lee , Nojun Kwak

Preparing Lessons: Improve Knowledge Distillation with Better Supervision

Knowledge distillation (KD) is widely used for training a compact model with the supervision of another large model, which could effectively improve the performance. Previous methods mainly focus on two aspects: 1) training the student to…

Computer Vision and Pattern Recognition · Computer Science 2020-07-27 Tiancheng Wen , Shenqi Lai , Xueming Qian

Do Students Debias Like Teachers? On the Distillability of Bias Mitigation Methods

Knowledge distillation (KD) is an effective method for model compression and transferring knowledge between models. However, its effect on model's robustness against spurious correlations that degrade performance on out-of-distribution data…

Machine Learning · Computer Science 2025-10-31 Jiali Cheng , Chirag Agarwal , Hadi Amiri

Towards Understanding and Improving Knowledge Distillation for Neural Machine Translation

Knowledge distillation (KD) is a promising technique for model compression in neural machine translation. However, where the knowledge hides in KD is still not clear, which may hinder the development of KD. In this work, we first unravel…

Computation and Language · Computer Science 2024-07-18 Songming Zhang , Yunlong Liang , Shuaibo Wang , Wenjuan Han , Jian Liu , Jinan Xu , Yufeng Chen

Annealing Knowledge Distillation

Significant memory and computational requirements of large deep neural networks restrict their application on edge devices. Knowledge distillation (KD) is a prominent model compression technique for deep neural networks in which the…

Computation and Language · Computer Science 2021-04-16 Aref Jafari , Mehdi Rezagholizadeh , Pranav Sharma , Ali Ghodsi

What Makes a Good Dataset for Knowledge Distillation?

Knowledge distillation (KD) has been a popular and effective method for model compression. One important assumption of KD is that the teacher's original dataset will also be available when training the student. However, in situations such…

Computer Vision and Pattern Recognition · Computer Science 2025-10-03 Logan Frank , Jim Davis

Efficient and Robust Knowledge Distillation from A Stronger Teacher Based on Correlation Matching

Knowledge Distillation (KD) has emerged as a pivotal technique for neural network compression and performance enhancement. Most KD methods aim to transfer dark knowledge from a cumbersome teacher model to a lightweight student model based…

Machine Learning · Computer Science 2024-10-10 Wenqi Niu , Yingchao Wang , Guohui Cai , Hanpo Hou