Related papers: Function-Consistent Feature Distillation

Focal and Global Knowledge Distillation for Detectors

Knowledge distillation has been applied to image classification successfully. However, object detection is much more sophisticated and most knowledge distillation methods have failed on it. In this paper, we point out that in object…

Computer Vision and Pattern Recognition · Computer Science 2022-03-10 Zhendong Yang , Zhe Li , Xiaohu Jiang , Yuan Gong , Zehuan Yuan , Danpei Zhao , Chun Yuan

Multi-Level Feature Distillation of Joint Teachers Trained on Distinct Image Datasets

We propose a novel teacher-student framework to distill knowledge from multiple teachers trained on distinct datasets. Each teacher is first trained from scratch on its own dataset. Then, the teachers are combined into a joint architecture,…

Computer Vision and Pattern Recognition · Computer Science 2024-10-30 Adrian Iordache , Bogdan Alexe , Radu Tudor Ionescu

Feature-domain Adaptive Contrastive Distillation for Efficient Single Image Super-Resolution

Recently, CNN-based SISR has numerous parameters and high computational cost to achieve better performance, limiting its applicability to resource-constrained devices such as mobile. As one of the methods to make the network efficient,…

Computer Vision and Pattern Recognition · Computer Science 2023-03-27 HyeonCheol Moon , JinWoo Jeong , SungJei Kim

Improving Knowledge Distillation via Regularizing Feature Norm and Direction

Knowledge distillation (KD) exploits a large well-trained model (i.e., teacher) to train a small student model on the same dataset for the same task. Treating teacher features as knowledge, prevailing methods of knowledge distillation train…

Computer Vision and Pattern Recognition · Computer Science 2023-05-29 Yuzhu Wang , Lechao Cheng , Manni Duan , Yongheng Wang , Zunlei Feng , Shu Kong

A Simple and Generic Framework for Feature Distillation via Channel-wise Transformation

Knowledge distillation is a popular technique for transferring the knowledge from a large teacher model to a smaller student model by mimicking. However, distillation by directly aligning the feature maps between teacher and student may…

Computer Vision and Pattern Recognition · Computer Science 2023-03-27 Ziwei Liu , Yongtao Wang , Xiaojie Chu

What Should Feature Distillation Transfer in LLMs? A Task-Tangent Geometry View

Feature-based knowledge distillation aims to transfer intermediate representations from a teacher LLM model to a student. Existing approaches typically rely on direct feature matching or learned projections, implicitly treating…

Computation and Language · Computer Science 2026-02-11 Khouloud Saadi , Di Wang

Matching Guided Distillation

Feature distillation is an effective way to improve the performance for a smaller student model, which has fewer parameters and lower computation cost compared to the larger teacher model. Unfortunately, there is a common obstacle - the gap…

Computer Vision and Pattern Recognition · Computer Science 2020-10-14 Kaiyu Yue , Jiangfan Deng , Feng Zhou

Distilling a Powerful Student Model via Online Knowledge Distillation

Existing online knowledge distillation approaches either adopt the student with the best performance or construct an ensemble model for better holistic performance. However, the former strategy ignores other students' information, while the…

Computer Vision and Pattern Recognition · Computer Science 2022-02-18 Shaojie Li , Mingbao Lin , Yan Wang , Yongjian Wu , Yonghong Tian , Ling Shao , Rongrong Ji

Knowledge Diffusion for Distillation

The representation gap between teacher and student is an emerging topic in knowledge distillation (KD). To reduce the gap and improve the performance, current methods often resort to complicated training schemes, loss functions, and feature…

Computer Vision and Pattern Recognition · Computer Science 2023-12-05 Tao Huang , Yuan Zhang , Mingkai Zheng , Shan You , Fei Wang , Chen Qian , Chang Xu

Distilling Knowledge by Mimicking Features

Knowledge distillation (KD) is a popular method to train efficient networks ("student") with the help of high-capacity networks ("teacher"). Traditional methods use the teacher's soft logits as extra supervision to train the student…

Computer Vision and Pattern Recognition · Computer Science 2021-08-17 Guo-Hua Wang , Yifan Ge , Jianxin Wu

Fixing the Teacher-Student Knowledge Discrepancy in Distillation

Training a small student network with the guidance of a larger teacher network is an effective way to promote the performance of the student. Despite the different types, the guided knowledge used to distill is always kept unchanged for…

Computer Vision and Pattern Recognition · Computer Science 2021-04-01 Jiangfan Han , Mengya Gao , Yujie Wang , Quanquan Li , Hongsheng Li , Xiaogang Wang

FEED: Feature-level Ensemble for Knowledge Distillation

Knowledge Distillation (KD) aims to transfer knowledge in a teacher-student framework, by providing the predictions of the teacher network to the student network in the training stage to help the student network generalize better. It can…

Computer Vision and Pattern Recognition · Computer Science 2019-09-25 SeongUk Park , Nojun Kwak

Frequency Attention for Knowledge Distillation

Knowledge distillation is an attractive approach for learning compact deep neural networks, which learns a lightweight student model by distilling knowledge from a complex teacher model. Attention-based knowledge distillation is a specific…

Computer Vision and Pattern Recognition · Computer Science 2024-03-12 Cuong Pham , Van-Anh Nguyen , Trung Le , Dinh Phung , Gustavo Carneiro , Thanh-Toan Do

Improved Feature Distillation via Projector Ensemble

In knowledge distillation, previous feature distillation methods mainly focus on the design of loss functions and the selection of the distilled layers, while the effect of the feature projector between the student and the teacher remains…

Computer Vision and Pattern Recognition · Computer Science 2023-03-02 Yudong Chen , Sen Wang , Jiajun Liu , Xuwei Xu , Frank de Hoog , Zi Huang

Knowledge Distillation for Object Detection via Rank Mimicking and Prediction-guided Feature Imitation

Knowledge Distillation (KD) is a widely-used technology to inherit information from cumbersome teacher models to compact student models, consequently realizing model compression and acceleration. Compared with image classification, object…

Computer Vision and Pattern Recognition · Computer Science 2021-12-10 Gang Li , Xiang Li , Yujie Wang , Shanshan Zhang , Yichao Wu , Ding Liang

Knowledge distillation through geometry-aware representational alignment

Knowledge distillation is a common paradigm for transferring capabilities from larger models to smaller ones. While traditional distillation methods leverage a probabilistic divergence over the output of the teacher and student models,…

Machine Learning · Computer Science 2025-10-01 Prajjwal Bhattarai , Mohammad Amjad , Dmytro Zhylko , Tuka Alhanai

G-DetKD: Towards General Distillation Framework for Object Detectors via Contrastive and Semantic-guided Feature Imitation

In this paper, we investigate the knowledge distillation (KD) strategy for object detection and propose an effective framework applicable to both homogeneous and heterogeneous student-teacher pairs. The conventional feature imitation…

Computer Vision and Pattern Recognition · Computer Science 2021-10-13 Lewei Yao , Renjie Pi , Hang Xu , Wei Zhang , Zhenguo Li , Tong Zhang

Federated Knowledge Distillation

Distributed learning frameworks often rely on exchanging model parameters across workers, instead of revealing their raw data. A prime example is federated learning that exchanges the gradients or weights of each neural network model. Under…

Machine Learning · Computer Science 2020-11-05 Hyowoon Seo , Jihong Park , Seungeun Oh , Mehdi Bennis , Seong-Lyun Kim

A Comprehensive Overhaul of Feature Distillation

We investigate the design aspects of feature distillation methods achieving network compression and propose a novel feature distillation method in which the distillation loss is designed to make a synergy among various aspects: teacher…

Computer Vision and Pattern Recognition · Computer Science 2019-08-12 Byeongho Heo , Jeesoo Kim , Sangdoo Yun , Hyojin Park , Nojun Kwak , Jin Young Choi

Feature Fusion for Online Mutual Knowledge Distillation

We propose a learning framework named Feature Fusion Learning (FFL) that efficiently trains a powerful classifier through a fusion module which combines the feature maps generated from parallel neural networks. Specifically, we train a…

Computer Vision and Pattern Recognition · Computer Science 2020-07-22 Jangho Kim , Minsung Hyun , Inseop Chung , Nojun Kwak