Related papers: Exploring Inconsistent Knowledge Distillation for …

Understanding and Improving Knowledge Distillation

Knowledge Distillation (KD) is a model-agnostic technique to improve model quality while having a fixed capacity budget. It is a commonly used technique for model compression, where a larger capacity teacher model with better quality is…

Machine Learning · Computer Science 2021-03-02 Jiaxi Tang , Rakesh Shivanna , Zhe Zhao , Dong Lin , Anima Singh , Ed H. Chi , Sagar Jain

Feature Alignment and Representation Transfer in Knowledge Distillation for Large Language Models

Knowledge distillation (KD) is a technique for transferring knowledge from complex teacher models to simpler student models, significantly enhancing model efficiency and accuracy. It has demonstrated substantial advancements in various…

Computation and Language · Computer Science 2025-04-21 Junjie Yang , Junhao Song , Xudong Han , Ziqian Bi , Tianyang Wang , Chia Xin Liang , Xinyuan Song , Yichao Zhang , Qian Niu , Benji Peng , Keyu Chen , Ming Liu

Beyond Classification: Knowledge Distillation using Multi-Object Impressions

Knowledge Distillation (KD) utilizes training data as a transfer set to transfer knowledge from a complex network (Teacher) to a smaller network (Student). Several works have recently identified many scenarios where the training data may…

Computer Vision and Pattern Recognition · Computer Science 2021-10-28 Gaurav Kumar Nayak , Monish Keswani , Sharan Seshadri , Anirban Chakraborty

Learning to Teach with Student Feedback

Knowledge distillation (KD) has gained much attention due to its effectiveness in compressing large-scale pre-trained models. In typical KD methods, the small student model is trained to match the soft targets generated by the big teacher…

Machine Learning · Computer Science 2021-09-13 Yitao Liu , Tianxiang Sun , Xipeng Qiu , Xuanjing Huang

Preparing Lessons: Improve Knowledge Distillation with Better Supervision

Knowledge distillation (KD) is widely used for training a compact model with the supervision of another large model, which could effectively improve the performance. Previous methods mainly focus on two aspects: 1) training the student to…

Computer Vision and Pattern Recognition · Computer Science 2020-07-27 Tiancheng Wen , Shenqi Lai , Xueming Qian

Improved knowledge distillation by utilizing backward pass knowledge in neural networks

Knowledge distillation (KD) is one of the prominent techniques for model compression. In this method, the knowledge of a large network (teacher) is distilled into a model (student) with usually significantly fewer parameters. KD tries to…

Machine Learning · Computer Science 2023-01-31 Aref Jafari , Mehdi Rezagholizadeh , Ali Ghodsi

Knowledge Distillation for Object Detection via Rank Mimicking and Prediction-guided Feature Imitation

Knowledge Distillation (KD) is a widely-used technology to inherit information from cumbersome teacher models to compact student models, consequently realizing model compression and acceleration. Compared with image classification, object…

Computer Vision and Pattern Recognition · Computer Science 2021-12-10 Gang Li , Xiang Li , Yujie Wang , Shanshan Zhang , Yichao Wu , Ding Liang

Discriminative and Consistent Representation Distillation

Knowledge Distillation (KD) aims to transfer knowledge from a large teacher model to a smaller student model. While contrastive learning has shown promise in self-supervised learning by creating discriminative representations, its…

Computer Vision and Pattern Recognition · Computer Science 2025-05-14 Nikolaos Giakoumoglou , Tania Stathaki

Domain-invariant Progressive Knowledge Distillation for UAV-based Object Detection

Knowledge distillation (KD) is an effective method for compressing models in object detection tasks. Due to limited computational capability, UAV-based object detection (UAV-OD) widely adopt the KD technique to obtain lightweight detectors.…

Computer Vision and Pattern Recognition · Computer Science 2024-08-22 Liang Yao , Fan Liu , Chuanyi Zhang , Zhiquan Ou , Ting Wu

Gradient-Guided Knowledge Distillation for Object Detectors

Deep learning models have demonstrated remarkable success in object detection, yet their complexity and computational intensity pose a barrier to deploying them in real-world applications (e.g., self-driving perception). Knowledge…

Computer Vision and Pattern Recognition · Computer Science 2023-03-09 Qizhen Lan , Qing Tian

Dynamic Rectification Knowledge Distillation

Knowledge Distillation is a technique which aims to utilize dark knowledge to compress and transfer information from a vast, well-trained neural network (teacher model) to a smaller, less capable neural network (student model) with improved…

Computer Vision and Pattern Recognition · Computer Science 2022-01-28 Fahad Rahman Amik , Ahnaf Ismat Tasin , Silvia Ahmed , M. M. Lutfe Elahi , Nabeel Mohammed

Knowledge Distillation Beyond Model Compression

Knowledge distillation (KD) is commonly deemed as an effective model compression technique in which a compact model (student) is trained under the supervision of a larger pretrained model or an ensemble of models (teacher). Various…

Machine Learning · Computer Science 2020-07-08 Fahad Sarfraz , Elahe Arani , Bahram Zonooz

Comparative Knowledge Distillation

In the era of large scale pretrained models, Knowledge Distillation (KD) serves an important role in transferring the wisdom of computationally heavy teacher models to lightweight, efficient student models while preserving performance.…

Machine Learning · Computer Science 2023-11-07 Alex Wilf , Alex Tianyi Xu , Paul Pu Liang , Alexander Obolenskiy , Daniel Fried , Louis-Philippe Morency

HARD: Hard Augmentations for Robust Distillation

Knowledge distillation (KD) is a simple and successful method to transfer knowledge from a teacher to a student model solely based on functional activity. However, current KD has a few shortcomings: it has recently been shown that this…

Computer Vision and Pattern Recognition · Computer Science 2023-05-26 Arne F. Nix , Max F. Burg , Fabian H. Sinz

Towards Efficient 3D Object Detection with Knowledge Distillation

Despite substantial progress in 3D object detection, advanced 3D detectors often suffer from heavy computation overheads. To this end, we explore the potential of knowledge distillation (KD) for developing efficient 3D object detectors,…

Computer Vision and Pattern Recognition · Computer Science 2022-10-17 Jihan Yang , Shaoshuai Shi , Runyu Ding , Zhe Wang , Xiaojuan Qi

Why does Knowledge Distillation Work? Rethink its Attention and Fidelity Mechanism

Does Knowledge Distillation (KD) really work? Conventional wisdom viewed it as a knowledge transfer procedure where a perfect mimicry of the student to its teacher is desired. However, paradoxical studies indicate that closely replicating…

Machine Learning · Computer Science 2024-05-03 Chenqi Guo , Shiwei Zhong , Xiaofeng Liu , Qianli Feng , Yinglong Ma

Distilling Invariant Representations with Dual Augmentation

Knowledge distillation (KD) has been widely used to transfer knowledge from large, accurate models (teachers) to smaller, efficient ones (students). Recent methods have explored enforcing consistency by incorporating causal interpretations…

Computer Vision and Pattern Recognition · Computer Science 2025-07-17 Nikolaos Giakoumoglou , Tania Stathaki

Role-Wise Data Augmentation for Knowledge Distillation

Knowledge Distillation (KD) is a common method for transferring the ``knowledge'' learned by one machine learning model (the \textit{teacher}) into another model (the \textit{student}), where typically, the teacher has a greater capacity…

Machine Learning · Computer Science 2020-04-21 Jie Fu , Xue Geng , Zhijian Duan , Bohan Zhuang , Xingdi Yuan , Adam Trischler , Jie Lin , Chris Pal , Hao Dong

Improving Knowledge Distillation with Teacher's Explanation

Knowledge distillation (KD) improves the performance of a low-complexity student model with the help of a more powerful teacher. The teacher in KD is a black-box model, imparting knowledge to the student only through its predictions. This…

Machine Learning · Computer Science 2023-10-05 Sayantan Chowdhury , Ben Liang , Ali Tizghadam , Ilijc Albanese

IDa-Det: An Information Discrepancy-aware Distillation for 1-bit Detectors

Knowledge distillation (KD) has been proven to be useful for training compact object detection models. However, we observe that KD is often effective when the teacher model and student counterpart share similar proposal information. This…

Computer Vision and Pattern Recognition · Computer Science 2022-10-10 Sheng Xu , Yanjing Li , Bohan Zeng , Teli ma , Baochang Zhang , Xianbin Cao , Peng Gao , Jinhu Lv