English
Related papers

Related papers: Multi-head Knowledge Distillation for Model Compre…

200 papers

Knowledge distillation (KD) is one of the prominent techniques for model compression. In this method, the knowledge of a large network (teacher) is distilled into a model (student) with usually significantly fewer parameters. KD tries to…

Machine Learning · Computer Science 2023-01-31 Aref Jafari , Mehdi Rezagholizadeh , Ali Ghodsi

Knowledge distillation (KD) is an effective model compression technique where a compact student network is taught to mimic the behavior of a complex and highly trained teacher network. In contrast, Mutual Learning (ML) provides an…

Computer Vision and Pattern Recognition · Computer Science 2021-10-25 Usma Niyaz , Deepti R. Bathula

Knowledge distillation (KD) has become an important technique for model compression and knowledge transfer. In this work, we first perform a comprehensive analysis of the knowledge transferred by different KD methods. We demonstrate that…

Computer Vision and Pattern Recognition · Computer Science 2021-06-07 Fei Ding , Yin Yang , Hongxin Hu , Venkat Krovi , Feng Luo

Knowledge distillation is an effective technique for pre-trained language model compression. Although existing knowledge distillation methods perform well for the most typical model BERT, they could be further improved in two aspects: the…

Computation and Language · Computer Science 2024-07-04 Ying Zhang , Ziheng Yang , Shufan Ji

Knowledge distillation (KD) is a popular method to train efficient networks ("student") with the help of high-capacity networks ("teacher"). Traditional methods use the teacher's soft logits as extra supervision to train the student…

Computer Vision and Pattern Recognition · Computer Science 2021-08-17 Guo-Hua Wang , Yifan Ge , Jianxin Wu

Multi-Teacher knowledge distillation provides students with additional supervision from multiple pre-trained teachers with diverse information sources. Most existing methods explore different weighting strategies to obtain a powerful…

Computer Vision and Pattern Recognition · Computer Science 2023-06-13 Hailin Zhang , Defang Chen , Can Wang

Knowledge distillation (KD) is an effective model compression technique that transfers knowledge from a high-performance teacher to a lightweight student, reducing computational and storage costs while maintaining competitive accuracy.…

Computer Vision and Pattern Recognition · Computer Science 2025-11-17 Fengming Yu , Haiwei Pan , Kejia Zhang , Jian Guan , Haiying Jiang

Knowledge distillation is considered as a training and compression strategy in which two neural networks, namely a teacher and a student, are coupled together during training. The teacher network is supposed to be a trustworthy predictor…

Computation and Language · Computer Science 2020-12-29 Peyman Passban , Yimeng Wu , Mehdi Rezagholizadeh , Qun Liu

Knowledge distillation (KD) is commonly deemed as an effective model compression technique in which a compact model (student) is trained under the supervision of a larger pretrained model or an ensemble of models (teacher). Various…

Machine Learning · Computer Science 2020-07-08 Fahad Sarfraz , Elahe Arani , Bahram Zonooz

This paper explores sequence-level knowledge distillation (KD) of multilingual pre-trained encoder-decoder translation models. We argue that the teacher model's output distribution holds valuable insights for the student, beyond the…

Knowledge Distillation (KD) has emerged as a pivotal technique for neural network compression and performance enhancement. Most KD methods aim to transfer dark knowledge from a cumbersome teacher model to a lightweight student model based…

Machine Learning · Computer Science 2024-10-10 Wenqi Niu , Yingchao Wang , Guohui Cai , Hanpo Hou

Knowledge distillation aims to compress a powerful yet cumbersome teacher model into a lightweight student model without much sacrifice of performance. For this purpose, various approaches have been proposed over the past few years,…

Computer Vision and Pattern Recognition · Computer Science 2022-03-29 Defang Chen , Jian-Ping Mei , Hailin Zhang , Can Wang , Yan Feng , Chun Chen

Knowledge Distillation (KD) aims at improving the performance of a low-capacity student model by inheriting knowledge from a high-capacity teacher model. Previous KD methods typically train a student by minimizing a task-related loss and…

Computer Vision and Pattern Recognition · Computer Science 2019-09-10 Mengya Gao , Yujun Shen , Quanquan Li , Junjie Yan , Liang Wan , Dahua Lin , Chen Change Loy , Xiaoou Tang

Knowledge Distillation (KD) has emerged as a promising technique for model compression but faces critical limitations: (1) sensitivity to hyperparameters requiring extensive manual tuning, (2) capacity gap when distilling from very large…

Machine Learning · Computer Science 2025-12-11 Gustavo Coelho Haase , Paulo Henrique Dourado da Silva

Knowledge distillation is an effective technique for pre-trained language model compression. However, existing methods only focus on the knowledge distribution among layers, which may cause the loss of fine-grained information in the…

Computation and Language · Computer Science 2026-04-06 Zihe Liu , Yulong Mao , Jinan Xu , Xinrui Peng , Kaiyu Huang

Knowledge Distillation (KD) has been validated as an effective model compression technique for learning compact object detectors. Existing state-of-the-art KD methods for object detection are mostly based on feature imitation. In this…

Computer Vision and Pattern Recognition · Computer Science 2024-04-16 Jiabao Wang , Yuming Chen , Zhaohui Zheng , Xiang Li , Ming-Ming Cheng , Qibin Hou

Knowledge distillation is a technique to enhance the generalization ability of a student model by exploiting outputs from a teacher model. Recently, feature-map based variants explore knowledge transfer between manually assigned…

Computer Vision and Pattern Recognition · Computer Science 2021-08-31 Defang Chen , Jian-Ping Mei , Yuan Zhang , Can Wang , Yan Feng , Chun Chen

Knowledge distillation (KD) is one of the most potent ways for model compression. The key idea is to transfer the knowledge from a deep teacher model (T) to a shallower student (S). However, existing methods suffer from performance…

Machine Learning · Computer Science 2020-02-24 Mengya Gao , Yujun Shen , Quanquan Li , Chen Change Loy

Efficient deployment of deep neural networks on resource-constrained devices demands advanced compression techniques that preserve accuracy and interoperability. This paper proposes a machine learning framework that augments Knowledge…

Machine Learning · Computer Science 2025-03-18 David E. Hernandez , Jose Ramon Chang , Torbjörn E. M. Nordling

Knowledge distillation (KD), known for its ability to transfer knowledge from a cumbersome network (teacher) to a lightweight one (student) without altering the architecture, has been garnering increasing attention. Two primary categories…

Computer Vision and Pattern Recognition · Computer Science 2024-09-30 Yaomin Huang , Zaomin Yan , Chaomin Shen , Faming Fang , Guixu Zhang
‹ Prev 1 2 3 10 Next ›