Related papers: Knowledge Diffusion for Distillation

Teacher-Guided Student Self-Knowledge Distillation Using Diffusion Model

Existing Knowledge Distillation (KD) methods often align feature information between teacher and student by exploring meaningful feature processing and loss functions. However, due to the difference in feature distributions between the…

Computer Vision and Pattern Recognition · Computer Science 2026-02-03 Yu Wang , Chuanguang Yang , Zhulin An , Weilun Feng , Jiarui Zhao , Chengqing Yu , Libo Huang , Boyu Diao , Yongjun Xu

Improving Knowledge Distillation via Regularizing Feature Norm and Direction

Knowledge distillation (KD) exploits a large well-trained model (i.e., teacher) to train a small student model on the same dataset for the same task. Treating teacher features as knowledge, prevailing methods of knowledge distillation train…

Computer Vision and Pattern Recognition · Computer Science 2023-05-29 Yuzhu Wang , Lechao Cheng , Manni Duan , Yongheng Wang , Zunlei Feng , Shu Kong

Feature Alignment and Representation Transfer in Knowledge Distillation for Large Language Models

Knowledge distillation (KD) is a technique for transferring knowledge from complex teacher models to simpler student models, significantly enhancing model efficiency and accuracy. It has demonstrated substantial advancements in various…

Computation and Language · Computer Science 2025-04-21 Junjie Yang , Junhao Song , Xudong Han , Ziqian Bi , Tianyang Wang , Chia Xin Liang , Xinyuan Song , Yichao Zhang , Qian Niu , Benji Peng , Keyu Chen , Ming Liu

Knowledge Distillation for Speech Denoising by Latent Representation Alignment with Cosine Distance

Speech denoising is a generally adopted and impactful task, appearing in many common and everyday-life use cases. Although there are very powerful methods published, most of those are too complex for deployment in everyday and low-resources…

Sound · Computer Science 2025-05-07 Diep Luong , Mikko Heikkinen , Konstantinos Drossos , Tuomas Virtanen

Data-free Knowledge Distillation with Diffusion Models

Recently Data-Free Knowledge Distillation (DFKD) has garnered attention and can transfer knowledge from a teacher neural network to a student neural network without requiring any access to training data. Although diffusion models are adept…

Computer Vision and Pattern Recognition · Computer Science 2025-04-02 Xiaohua Qi , Renda Li , Long Peng , Qiang Ling , Jun Yu , Ziyi Chen , Peng Chang , Mei Han , Jing Xiao

Knowledge Distillation with Deep Supervision

Knowledge distillation aims to enhance the performance of a lightweight student model by exploiting the knowledge from a pre-trained cumbersome teacher model. However, in the traditional knowledge distillation, teacher predictions are only…

Machine Learning · Computer Science 2023-05-26 Shiya Luo , Defang Chen , Can Wang

Better Teacher Better Student: Dynamic Prior Knowledge for Knowledge Distillation

Knowledge distillation (KD) has shown very promising capabilities in transferring learning representations from large models (teachers) to small models (students). However, as the capacity gap between students and teachers becomes larger,…

Computer Vision and Pattern Recognition · Computer Science 2023-03-24 Zengyu Qiu , Xinzhu Ma , Kunlin Yang , Chunya Liu , Jun Hou , Shuai Yi , Wanli Ouyang

Distilling Knowledge by Mimicking Features

Knowledge distillation (KD) is a popular method to train efficient networks ("student") with the help of high-capacity networks ("teacher"). Traditional methods use the teacher's soft logits as extra supervision to train the student…

Computer Vision and Pattern Recognition · Computer Science 2021-08-17 Guo-Hua Wang , Yifan Ge , Jianxin Wu

Discriminative and Consistent Representation Distillation

Knowledge Distillation (KD) aims to transfer knowledge from a large teacher model to a smaller student model. While contrastive learning has shown promise in self-supervised learning by creating discriminative representations, its…

Computer Vision and Pattern Recognition · Computer Science 2025-05-14 Nikolaos Giakoumoglou , Tania Stathaki

Decoupling Dark Knowledge via Block-wise Logit Distillation for Feature-level Alignment

Knowledge Distillation (KD), a learning manner with a larger teacher network guiding a smaller student network, transfers dark knowledge from the teacher to the student via logits or intermediate features, with the aim of producing a…

Machine Learning · Computer Science 2024-12-04 Chengting Yu , Fengzhao Zhang , Ruizhe Chen , Aili Wang , Zuozhu Liu , Shurun Tan , Er-Ping Li

Knowledge Distillation for Object Detection via Rank Mimicking and Prediction-guided Feature Imitation

Knowledge Distillation (KD) is a widely-used technology to inherit information from cumbersome teacher models to compact student models, consequently realizing model compression and acceleration. Compared with image classification, object…

Computer Vision and Pattern Recognition · Computer Science 2021-12-10 Gang Li , Xiang Li , Yujie Wang , Shanshan Zhang , Yichao Wu , Ding Liang

Knowledge Distillation Performs Partial Variance Reduction

Knowledge distillation is a popular approach for enhancing the performance of ''student'' models, with lower representational capacity, by taking advantage of more powerful ''teacher'' models. Despite its apparent simplicity and widespread…

Machine Learning · Computer Science 2023-12-12 Mher Safaryan , Alexandra Peste , Dan Alistarh

Knowledge Distillation with the Reused Teacher Classifier

Knowledge distillation aims to compress a powerful yet cumbersome teacher model into a lightweight student model without much sacrifice of performance. For this purpose, various approaches have been proposed over the past few years,…

Computer Vision and Pattern Recognition · Computer Science 2022-03-29 Defang Chen , Jian-Ping Mei , Hailin Zhang , Can Wang , Yan Feng , Chun Chen

An Embarrassingly Simple Approach for Knowledge Distillation

Knowledge Distillation (KD) aims at improving the performance of a low-capacity student model by inheriting knowledge from a high-capacity teacher model. Previous KD methods typically train a student by minimizing a task-related loss and…

Computer Vision and Pattern Recognition · Computer Science 2019-09-10 Mengya Gao , Yujun Shen , Quanquan Li , Junjie Yan , Liang Wan , Dahua Lin , Chen Change Loy , Xiaoou Tang

Improving Knowledge Distillation with Teacher's Explanation

Knowledge distillation (KD) improves the performance of a low-complexity student model with the help of a more powerful teacher. The teacher in KD is a black-box model, imparting knowledge to the student only through its predictions. This…

Machine Learning · Computer Science 2023-10-05 Sayantan Chowdhury , Ben Liang , Ali Tizghadam , Ilijc Albanese

Modeling Teacher-Student Techniques in Deep Neural Networks for Knowledge Distillation

Knowledge distillation (KD) is a new method for transferring knowledge of a structure under training to another one. The typical application of KD is in the form of learning a small model (named as a student) by soft labels produced by a…

Computer Vision and Pattern Recognition · Computer Science 2020-01-01 Sajjad Abbasi , Mohsen Hajabdollahi , Nader Karimi , Shadrokh Samavi

Extracting knowledge from features with multilevel abstraction

Knowledge distillation aims at transferring the knowledge from a large teacher model to a small student model with great improvements of the performance of the student model. Therefore, the student network can replace the teacher network to…

Machine Learning · Computer Science 2021-12-28 Jinhong Lin , Zhaoyang Li

ProxylessKD: Direct Knowledge Distillation with Inherited Classifier for Face Recognition

Knowledge Distillation (KD) refers to transferring knowledge from a large model to a smaller one, which is widely used to enhance model performance in machine learning. It tries to align embedding spaces generated from the teacher and the…

Computer Vision and Pattern Recognition · Computer Science 2020-11-03 Weidong Shi , Guanghui Ren , Yunpeng Chen , Shuicheng Yan

Distilling a Powerful Student Model via Online Knowledge Distillation

Existing online knowledge distillation approaches either adopt the student with the best performance or construct an ensemble model for better holistic performance. However, the former strategy ignores other students' information, while the…

Computer Vision and Pattern Recognition · Computer Science 2022-02-18 Shaojie Li , Mingbao Lin , Yan Wang , Yongjian Wu , Yonghong Tian , Ling Shao , Rongrong Ji

FiGKD: Fine-Grained Knowledge Distillation via High-Frequency Detail Transfer

Knowledge distillation (KD) is a widely adopted technique for transferring knowledge from a high-capacity teacher model to a smaller student model by aligning their output distributions. However, existing methods often underperform in…

Computer Vision and Pattern Recognition · Computer Science 2026-03-25 Seonghak Kim