Related papers: Structural Knowledge Distillation for Object Detec…

Improving Knowledge Distillation via Regularizing Feature Norm and Direction

Knowledge distillation (KD) exploits a large well-trained model (i.e., teacher) to train a small student model on the same dataset for the same task. Treating teacher features as knowledge, prevailing methods of knowledge distillation train…

Computer Vision and Pattern Recognition · Computer Science 2023-05-29 Yuzhu Wang , Lechao Cheng , Manni Duan , Yongheng Wang , Zunlei Feng , Shu Kong

Architectural Insights into Knowledge Distillation for Object Detection: A Comprehensive Review

Object detection has achieved remarkable accuracy through deep learning, yet these improvements often come with increased computational cost, limiting deployment on resource-constrained devices. Knowledge Distillation (KD) provides an…

Computer Vision and Pattern Recognition · Computer Science 2025-08-06 Mahdi Golizadeh , Nassibeh Golizadeh , Mohammad Ali Keyvanrad , Hossein Shirazi

An Embarrassingly Simple Approach for Knowledge Distillation

Knowledge Distillation (KD) aims at improving the performance of a low-capacity student model by inheriting knowledge from a high-capacity teacher model. Previous KD methods typically train a student by minimizing a task-related loss and…

Computer Vision and Pattern Recognition · Computer Science 2019-09-10 Mengya Gao , Yujun Shen , Quanquan Li , Junjie Yan , Liang Wan , Dahua Lin , Chen Change Loy , Xiaoou Tang

Knowledge Distillation for Object Detection via Rank Mimicking and Prediction-guided Feature Imitation

Knowledge Distillation (KD) is a widely-used technology to inherit information from cumbersome teacher models to compact student models, consequently realizing model compression and acceleration. Compared with image classification, object…

Computer Vision and Pattern Recognition · Computer Science 2021-12-10 Gang Li , Xiang Li , Yujie Wang , Shanshan Zhang , Yichao Wu , Ding Liang

Feature Alignment and Representation Transfer in Knowledge Distillation for Large Language Models

Knowledge distillation (KD) is a technique for transferring knowledge from complex teacher models to simpler student models, significantly enhancing model efficiency and accuracy. It has demonstrated substantial advancements in various…

Computation and Language · Computer Science 2025-04-21 Junjie Yang , Junhao Song , Xudong Han , Ziqian Bi , Tianyang Wang , Chia Xin Liang , Xinyuan Song , Yichao Zhang , Qian Niu , Benji Peng , Keyu Chen , Ming Liu

Localization Distillation for Dense Object Detection

Knowledge distillation (KD) has witnessed its powerful capability in learning compact models in object detection. Previous KD methods for object detection mostly focus on imitating deep features within the imitation regions instead of…

Computer Vision and Pattern Recognition · Computer Science 2022-04-01 Zhaohui Zheng , Rongguang Ye , Ping Wang , Dongwei Ren , Wangmeng Zuo , Qibin Hou , Ming-Ming Cheng

Modeling Teacher-Student Techniques in Deep Neural Networks for Knowledge Distillation

Knowledge distillation (KD) is a new method for transferring knowledge of a structure under training to another one. The typical application of KD is in the form of learning a small model (named as a student) by soft labels produced by a…

Computer Vision and Pattern Recognition · Computer Science 2020-01-01 Sajjad Abbasi , Mohsen Hajabdollahi , Nader Karimi , Shadrokh Samavi

Locally Linear Region Knowledge Distillation

Knowledge distillation (KD) is an effective technique to transfer knowledge from one neural network (teacher) to another (student), thus improving the performance of the student. To make the student better mimic the behavior of the teacher,…

Machine Learning · Computer Science 2020-10-20 Xiang Deng , Zhongfei , Zhang

RdimKD: Generic Distillation Paradigm by Dimensionality Reduction

Knowledge Distillation (KD) emerges as one of the most promising compression technologies to run advanced deep neural networks on resource-limited devices. In order to train a small network (student) under the guidance of a large network…

Machine Learning · Computer Science 2023-12-15 Yi Guo , Yiqian He , Xiaoyang Li , Haotong Qin , Van Tung Pham , Yang Zhang , Shouda Liu

Towards Zero-Shot Knowledge Distillation for Natural Language Processing

Knowledge Distillation (KD) is a common knowledge transfer algorithm used for model compression across a variety of deep learning based natural language processing (NLP) solutions. In its regular manifestations, KD requires access to the…

Computation and Language · Computer Science 2021-01-01 Ahmad Rashid , Vasileios Lioutas , Abbas Ghaddar , Mehdi Rezagholizadeh

Distilling Knowledge by Mimicking Features

Knowledge distillation (KD) is a popular method to train efficient networks ("student") with the help of high-capacity networks ("teacher"). Traditional methods use the teacher's soft logits as extra supervision to train the student…

Computer Vision and Pattern Recognition · Computer Science 2021-08-17 Guo-Hua Wang , Yifan Ge , Jianxin Wu

Beyond Classification: Knowledge Distillation using Multi-Object Impressions

Knowledge Distillation (KD) utilizes training data as a transfer set to transfer knowledge from a complex network (Teacher) to a smaller network (Student). Several works have recently identified many scenarios where the training data may…

Computer Vision and Pattern Recognition · Computer Science 2021-10-28 Gaurav Kumar Nayak , Monish Keswani , Sharan Seshadri , Anirban Chakraborty

A Cohesive Distillation Architecture for Neural Language Models

A recent trend in Natural Language Processing is the exponential growth in Language Model (LM) size, which prevents research groups without a necessary hardware infrastructure from participating in the development process. This study…

Computation and Language · Computer Science 2023-01-31 Jan Philip Wahle

Knowledge Distillation Beyond Model Compression

Knowledge distillation (KD) is commonly deemed as an effective model compression technique in which a compact model (student) is trained under the supervision of a larger pretrained model or an ensemble of models (teacher). Various…

Machine Learning · Computer Science 2020-07-08 Fahad Sarfraz , Elahe Arani , Bahram Zonooz

Residual Knowledge Distillation

Knowledge distillation (KD) is one of the most potent ways for model compression. The key idea is to transfer the knowledge from a deep teacher model (T) to a shallower student (S). However, existing methods suffer from performance…

Machine Learning · Computer Science 2020-02-24 Mengya Gao , Yujun Shen , Quanquan Li , Chen Change Loy

Exploring Inconsistent Knowledge Distillation for Object Detection with Data Augmentation

Knowledge Distillation (KD) for object detection aims to train a compact detector by transferring knowledge from a teacher model. Since the teacher model perceives data in a way different from humans, existing KD methods only distill…

Computer Vision and Pattern Recognition · Computer Science 2024-02-22 Jiawei Liang , Siyuan Liang , Aishan Liu , Ke Ma , Jingzhi Li , Xiaochun Cao

Towards Efficient 3D Object Detection with Knowledge Distillation

Despite substantial progress in 3D object detection, advanced 3D detectors often suffer from heavy computation overheads. To this end, we explore the potential of knowledge distillation (KD) for developing efficient 3D object detectors,…

Computer Vision and Pattern Recognition · Computer Science 2022-10-17 Jihan Yang , Shaoshuai Shi , Runyu Ding , Zhe Wang , Xiaojuan Qi

Efficient and Robust Knowledge Distillation from A Stronger Teacher Based on Correlation Matching

Knowledge Distillation (KD) has emerged as a pivotal technique for neural network compression and performance enhancement. Most KD methods aim to transfer dark knowledge from a cumbersome teacher model to a lightweight student model based…

Machine Learning · Computer Science 2024-10-10 Wenqi Niu , Yingchao Wang , Guohui Cai , Hanpo Hou

Gradient-Guided Knowledge Distillation for Object Detectors

Deep learning models have demonstrated remarkable success in object detection, yet their complexity and computational intensity pose a barrier to deploying them in real-world applications (e.g., self-driving perception). Knowledge…

Computer Vision and Pattern Recognition · Computer Science 2023-03-09 Qizhen Lan , Qing Tian

Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks

Deep neural models in recent years have been successful in almost every field, including extremely complex problem statements. However, these models are huge in size, with millions (and even billions) of parameters, thus demanding more…

Computer Vision and Pattern Recognition · Computer Science 2021-06-18 Lin Wang , Kuk-Jin Yoon