English
Related papers

Related papers: Decoupled Knowledge Distillation

200 papers

In the history of knowledge distillation, the focus has once shifted over time from logit-based to feature-based approaches. However, this transition has been revisited with the advent of Decoupled Knowledge Distillation (DKD), which…

Machine Learning · Computer Science 2025-12-05 Bowen Zheng , Ran Cheng

Knowledge Distillation (KD), a learning manner with a larger teacher network guiding a smaller student network, transfers dark knowledge from the teacher to the student via logits or intermediate features, with the aim of producing a…

Machine Learning · Computer Science 2024-12-04 Chengting Yu , Fengzhao Zhang , Ruizhe Chen , Aili Wang , Zuozhu Liu , Shurun Tan , Er-Ping Li

Knowledge distillation aims to transfer knowledge to the student model by utilizing the predictions/features of the teacher model, and feature-based distillation has recently shown its superiority over logit-based distillation. However, due…

Computer Vision and Pattern Recognition · Computer Science 2022-11-29 Shuoxi Zhang , Hanpeng Liu , John E. Hopcroft , Kun He

Compared with the feature-based distillation methods, logits distillation can liberalize the requirements of consistent feature dimension between teacher and student networks, while the performance is deemed inferior in face recognition.…

Computer Vision and Pattern Recognition · Computer Science 2023-04-11 Weisong Zhao , Xiangyu Zhu , Kaiwen Guo , Xiao-Yu Zhang , Zhen Lei

Recent advances in knowledge distillation (KD) predominantly emphasize feature-level knowledge transfer, frequently overlooking critical information embedded within the teacher's logit distributions. In this paper, we revisit logit-based…

Computer Vision and Pattern Recognition · Computer Science 2025-08-07 Qi Wang , Jinjia Zhou

Logit knowledge distillation attracts increasing attention due to its practicality in recent studies. However, it often suffers inferior performance compared to the feature knowledge distillation. In this paper, we argue that existing…

Computer Vision and Pattern Recognition · Computer Science 2024-03-21 Shicai Wei Chunbo Luo Yang Luo

Knowledge distillation (KD) is a substantial strategy for transferring learned knowledge from one neural network model to another. A vast number of methods have been developed for this strategy. While most method designs a more efficient…

Machine Learning · Computer Science 2022-03-22 Yen-Chang Hsu , James Smith , Yilin Shen , Zsolt Kira , Hongxia Jin

Knowledge distillation (KD) methods can transfer knowledge of a parameter-heavy teacher model to a light-weight student model. The status quo for feature KD methods is to utilize loss functions based on logits (i.e., pre-softmax class…

Computer Vision and Pattern Recognition · Computer Science 2025-11-20 Nicholas Cooper , Lijun Chen , Sailesh Dwivedy , Danna Gurari

Knowledge Distillation (KD) uses the teacher's prediction logits as soft labels to guide the student, while self-KD does not need a real teacher to require the soft labels. This work unifies the formulations of the two tasks by decomposing…

Computer Vision and Pattern Recognition · Computer Science 2023-07-18 Zhendong Yang , Ailing Zeng , Zhe Li , Tianke Zhang , Chun Yuan , Yu Li

Knowledge Distillation (KD) aims at improving the performance of a low-capacity student model by inheriting knowledge from a high-capacity teacher model. Previous KD methods typically train a student by minimizing a task-related loss and…

Computer Vision and Pattern Recognition · Computer Science 2019-09-10 Mengya Gao , Yujun Shen , Quanquan Li , Junjie Yan , Liang Wan , Dahua Lin , Chen Change Loy , Xiaoou Tang

Existing knowledge distillation (KD) methods have demonstrated their ability in achieving student network performance on par with their teachers. However, the knowledge gap between the teacher and student remains significant and may hinder…

Computer Vision and Pattern Recognition · Computer Science 2024-12-17 Shuoxi Zhang , Zijian Song , Kun He

Recent advances in knowledge distillation have emphasized the importance of decoupling different knowledge components. While existing methods utilize momentum mechanisms to separate task-oriented and distillation gradients, they overlook…

Computer Vision and Pattern Recognition · Computer Science 2025-05-22 Haiduo Huang , Jiangcheng Song , Yadong Zhang , Pengju Ren

Knowledge distillation (KD) is a technique for transferring knowledge from complex teacher models to simpler student models, significantly enhancing model efficiency and accuracy. It has demonstrated substantial advancements in various…

Computation and Language · Computer Science 2025-04-21 Junjie Yang , Junhao Song , Xudong Han , Ziqian Bi , Tianyang Wang , Chia Xin Liang , Xinyuan Song , Yichao Zhang , Qian Niu , Benji Peng , Keyu Chen , Ming Liu

In knowledge distillation (KD), logit distillation (LD) aims to transfer class-level knowledge from a more powerful teacher network to a small student model via accurate teacher-student alignment at the logits level. Since high-confidence…

Computer Vision and Pattern Recognition · Computer Science 2025-06-02 Jiayan Li , Jun Li , Zhourui Zhang , Jianhua Xu

Class-incremental semantic segmentation (CISS) labels each pixel of an image with a corresponding object/stuff class continually. To this end, it is crucial to learn novel classes incrementally without forgetting previously learned…

Computer Vision and Pattern Recognition · Computer Science 2022-10-13 Donghyeon Baek , Youngmin Oh , Sanghoon Lee , Junghyup Lee , Bumsub Ham

Traditional knowledge distillation focuses on aligning the student's predicted probabilities with both ground-truth labels and the teacher's predicted probabilities. However, the transition to predicted probabilities from logits would…

Computer Vision and Pattern Recognition · Computer Science 2026-04-08 Penghui Yang , Chen-Chen Zong , Sheng-Jun Huang , Lei Feng , Bo An

Knowledge distillation (KD) is one of the most potent ways for model compression. The key idea is to transfer the knowledge from a deep teacher model (T) to a shallower student (S). However, existing methods suffer from performance…

Machine Learning · Computer Science 2020-02-24 Mengya Gao , Yujun Shen , Quanquan Li , Chen Change Loy

Deep neural networks (DNNs) have improved NLP tasks significantly, but training and maintaining such networks could be costly. Model compression techniques, such as, knowledge distillation (KD), have been proposed to address the issue;…

Computation and Language · Computer Science 2023-11-08 Manas Mohanty , Tanya Roosta , Peyman Passban

In this paper, we propose a simple yet effective contrastive knowledge distillation framework that achieves sample-wise logit alignment while preserving semantic consistency. Conventional knowledge distillation approaches exhibit…

Computer Vision and Pattern Recognition · Computer Science 2025-03-26 Wencheng Zhu , Xin Zhou , Pengfei Zhu , Yu Wang , Qinghua Hu

Previous knowledge distillation (KD) methods for object detection mostly focus on feature imitation instead of mimicking the prediction logits due to its inefficiency in distilling the localization information. In this paper, we investigate…

Computer Vision and Pattern Recognition · Computer Science 2022-12-09 Zhaohui Zheng , Rongguang Ye , Qibin Hou , Dongwei Ren , Ping Wang , Wangmeng Zuo , Ming-Ming Cheng
‹ Prev 1 2 3 10 Next ›