Related papers: Relational Knowledge Distillation

Locally Linear Region Knowledge Distillation

Knowledge distillation (KD) is an effective technique to transfer knowledge from one neural network (teacher) to another (student), thus improving the performance of the student. To make the student better mimic the behavior of the teacher,…

Machine Learning · Computer Science 2020-10-20 Xiang Deng , Zhongfei , Zhang

Relational Representation Distillation

Knowledge distillation involves transferring knowledge from large, cumbersome teacher models to more compact student models. The standard approach minimizes the Kullback-Leibler (KL) divergence between the probabilistic outputs of a teacher…

Computer Vision and Pattern Recognition · Computer Science 2025-05-14 Nikolaos Giakoumoglou , Tania Stathaki

Residual Knowledge Distillation

Knowledge distillation (KD) is one of the most potent ways for model compression. The key idea is to transfer the knowledge from a deep teacher model (T) to a shallower student (S). However, existing methods suffer from performance…

Machine Learning · Computer Science 2020-02-24 Mengya Gao , Yujun Shen , Quanquan Li , Chen Change Loy

Group Relative Knowledge Distillation: Learning from Teacher's Relational Inductive Bias

Knowledge distillation typically transfers knowledge from a teacher model to a student model by minimizing differences between their output distributions. However, existing distillation approaches largely focus on mimicking absolute…

Machine Learning · Computer Science 2025-04-30 Chao Li , Changhua Zhou , Jia Chen

Quantum Relational Knowledge Distillation

Knowledge distillation (KD) is a widely adopted technique for compressing large models into smaller, more efficient student models that can be deployed on devices with limited computational resources. Among various KD methods, Relational…

Quantum Physics · Physics 2025-08-19 Chen-Yu Liu , Kuan-Cheng Chen , Keisuke Murota , Samuel Yen-Chi Chen , Enrico Rinaldi

Modeling Teacher-Student Techniques in Deep Neural Networks for Knowledge Distillation

Knowledge distillation (KD) is a new method for transferring knowledge of a structure under training to another one. The typical application of KD is in the form of learning a small model (named as a student) by soft labels produced by a…

Computer Vision and Pattern Recognition · Computer Science 2020-01-01 Sajjad Abbasi , Mohsen Hajabdollahi , Nader Karimi , Shadrokh Samavi

Gradient Knowledge Distillation for Pre-trained Language Models

Knowledge distillation (KD) is an effective framework to transfer knowledge from a large-scale teacher to a compact yet well-performing student. Previous KD practices for pre-trained language models mainly transfer knowledge by aligning…

Computation and Language · Computer Science 2022-11-03 Lean Wang , Lei Li , Xu Sun

Feature Alignment and Representation Transfer in Knowledge Distillation for Large Language Models

Knowledge distillation (KD) is a technique for transferring knowledge from complex teacher models to simpler student models, significantly enhancing model efficiency and accuracy. It has demonstrated substantial advancements in various…

Computation and Language · Computer Science 2025-04-21 Junjie Yang , Junhao Song , Xudong Han , Ziqian Bi , Tianyang Wang , Chia Xin Liang , Xinyuan Song , Yichao Zhang , Qian Niu , Benji Peng , Keyu Chen , Ming Liu

Knowledge Distillation via Token-level Relationship Graph

Knowledge distillation is a powerful technique for transferring knowledge from a pre-trained teacher model to a student model. However, the true potential of knowledge transfer has not been fully explored. Existing approaches primarily…

Machine Learning · Computer Science 2023-06-23 Shuoxi Zhang , Hanpeng Liu , Kun He

Triplet Loss for Knowledge Distillation

In recent years, deep learning has spread rapidly, and deeper, larger models have been proposed. However, the calculation cost becomes enormous as the size of the models becomes larger. Various techniques for compressing the size of the…

Machine Learning · Computer Science 2020-04-20 Hideki Oki , Motoshi Abe , Junichi Miyao , Takio Kurita

Knowledge Condensation Distillation

Knowledge Distillation (KD) transfers the knowledge from a high-capacity teacher network to strengthen a smaller student. Existing methods focus on excavating the knowledge hints and transferring the whole knowledge to the student. However,…

Computer Vision and Pattern Recognition · Computer Science 2022-07-13 Chenxin Li , Mingbao Lin , Zhiyuan Ding , Nie Lin , Yihong Zhuang , Yue Huang , Xinghao Ding , Liujuan Cao

Complementary Relation Contrastive Distillation

Knowledge distillation aims to transfer representation ability from a teacher model to a student model. Previous approaches focus on either individual representation distillation or inter-sample similarity preservation. While we argue that…

Computer Vision and Pattern Recognition · Computer Science 2021-03-31 Jinguo Zhu , Shixiang Tang , Dapeng Chen , Shijie Yu , Yakun Liu , Aijun Yang , Mingzhe Rong , Xiaohua Wang

What Knowledge Gets Distilled in Knowledge Distillation?

Knowledge distillation aims to transfer useful information from a teacher network to a student network, with the primary goal of improving the student's performance for the task at hand. Over the years, there has a been a deluge of novel…

Computer Vision and Pattern Recognition · Computer Science 2023-11-07 Utkarsh Ojha , Yuheng Li , Anirudh Sundara Rajan , Yingyu Liang , Yong Jae Lee

Discriminative and Consistent Representation Distillation

Knowledge Distillation (KD) aims to transfer knowledge from a large teacher model to a smaller student model. While contrastive learning has shown promise in self-supervised learning by creating discriminative representations, its…

Computer Vision and Pattern Recognition · Computer Science 2025-05-14 Nikolaos Giakoumoglou , Tania Stathaki

Multi-level Knowledge Distillation via Knowledge Alignment and Correlation

Knowledge distillation (KD) has become an important technique for model compression and knowledge transfer. In this work, we first perform a comprehensive analysis of the knowledge transferred by different KD methods. We demonstrate that…

Computer Vision and Pattern Recognition · Computer Science 2021-06-07 Fei Ding , Yin Yang , Hongxin Hu , Venkat Krovi , Feng Luo

Dynamic Rectification Knowledge Distillation

Knowledge Distillation is a technique which aims to utilize dark knowledge to compress and transfer information from a vast, well-trained neural network (teacher model) to a smaller, less capable neural network (student model) with improved…

Computer Vision and Pattern Recognition · Computer Science 2022-01-28 Fahad Rahman Amik , Ahnaf Ismat Tasin , Silvia Ahmed , M. M. Lutfe Elahi , Nabeel Mohammed

Improved knowledge distillation by utilizing backward pass knowledge in neural networks

Knowledge distillation (KD) is one of the prominent techniques for model compression. In this method, the knowledge of a large network (teacher) is distilled into a model (student) with usually significantly fewer parameters. KD tries to…

Machine Learning · Computer Science 2023-01-31 Aref Jafari , Mehdi Rezagholizadeh , Ali Ghodsi

Understanding and Improving Knowledge Distillation

Knowledge Distillation (KD) is a model-agnostic technique to improve model quality while having a fixed capacity budget. It is a commonly used technique for model compression, where a larger capacity teacher model with better quality is…

Machine Learning · Computer Science 2021-03-02 Jiaxi Tang , Rakesh Shivanna , Zhe Zhao , Dong Lin , Anima Singh , Ed H. Chi , Sagar Jain

Knowledge Distillation Beyond Model Compression

Knowledge distillation (KD) is commonly deemed as an effective model compression technique in which a compact model (student) is trained under the supervision of a larger pretrained model or an ensemble of models (teacher). Various…

Machine Learning · Computer Science 2020-07-08 Fahad Sarfraz , Elahe Arani , Bahram Zonooz

Collaborative Teacher-Student Learning via Multiple Knowledge Transfer

Knowledge distillation (KD), as an efficient and effective model compression technique, has been receiving considerable attention in deep learning. The key to its success is to transfer knowledge from a large teacher network to a small…

Machine Learning · Computer Science 2021-01-28 Liyuan Sun , Jianping Gou , Baosheng Yu , Lan Du , Dacheng Tao