Related papers: A Note on Knowledge Distillation Loss Function for…

Task Integration Distillation for Object Detectors

Knowledge distillation is a widely adopted technique for model lightening. However, the performance of most knowledge distillation methods in the domain of object detection is not satisfactory. Typically, knowledge distillation approaches…

Computer Vision and Pattern Recognition · Computer Science 2024-04-03 Hai Su , ZhenWen Jian , Songsen Yu

Knowledge Distillation $\approx$ Label Smoothing: Fact or Fallacy?

Originally proposed as a method for knowledge transfer from one model to another, some recent studies have suggested that knowledge distillation (KD) is in fact a form of regularization. Perhaps the strongest argument of all for this new…

Machine Learning · Computer Science 2023-10-26 Md Arafat Sultan

The State of Knowledge Distillation for Classification

We survey various knowledge distillation (KD) strategies for simple classification tasks and implement a set of techniques that claim state-of-the-art accuracy. Our experiments using standardized model architectures, fixed compute budgets,…

Machine Learning · Computer Science 2019-12-24 Fabian Ruffy , Karanbir Chahal

Understanding the Role of the Projector in Knowledge Distillation

In this paper we revisit the efficacy of knowledge distillation as a function matching and metric learning problem. In doing so we verify three important design decisions, namely the normalisation, soft maximum function, and projection…

Computer Vision and Pattern Recognition · Computer Science 2024-02-02 Roy Miles , Krystian Mikolajczyk

Class-aware Information for Logit-based Knowledge Distillation

Knowledge distillation aims to transfer knowledge to the student model by utilizing the predictions/features of the teacher model, and feature-based distillation has recently shown its superiority over logit-based distillation. However, due…

Computer Vision and Pattern Recognition · Computer Science 2022-11-29 Shuoxi Zhang , Hanpeng Liu , John E. Hopcroft , Kun He

What Knowledge Gets Distilled in Knowledge Distillation?

Knowledge distillation aims to transfer useful information from a teacher network to a student network, with the primary goal of improving the student's performance for the task at hand. Over the years, there has a been a deluge of novel…

Computer Vision and Pattern Recognition · Computer Science 2023-11-07 Utkarsh Ojha , Yuheng Li , Anirudh Sundara Rajan , Yingyu Liang , Yong Jae Lee

Is Label Smoothing Truly Incompatible with Knowledge Distillation: An Empirical Study

This work aims to empirically clarify a recently discovered perspective that label smoothing is incompatible with knowledge distillation. We begin by introducing the motivation behind on how this incompatibility is raised, i.e., label…

Machine Learning · Computer Science 2021-04-02 Zhiqiang Shen , Zechun Liu , Dejia Xu , Zitian Chen , Kwang-Ting Cheng , Marios Savvides

Knowledge Distillation with Refined Logits

Recent research on knowledge distillation has increasingly focused on logit distillation because of its simplicity, effectiveness, and versatility in model compression. In this paper, we introduce Refined Logit Distillation (RLD) to address…

Computer Vision and Pattern Recognition · Computer Science 2025-07-29 Wujie Sun , Defang Chen , Siwei Lyu , Genlang Chen , Chun Chen , Can Wang

Towards a Unified View of Affinity-Based Knowledge Distillation

Knowledge transfer between artificial neural networks has become an important topic in deep learning. Among the open questions are what kind of knowledge needs to be preserved for the transfer, and how it can be effectively achieved.…

Computer Vision and Pattern Recognition · Computer Science 2022-10-03 Vladimir Li , Atsuto Maki

Knowledge distillation is a widely applicable technique for training a student neural network under the guidance of a trained teacher network. For example, in neural network compression, a high-capacity teacher is distilled to train a…

Computer Vision and Pattern Recognition · Computer Science 2019-08-05 Frederick Tung , Greg Mori

A Functional Perspective on Knowledge Distillation in Neural Networks

Knowledge distillation is considered a compression mechanism when judged on the resulting student's accuracy and loss, yet its functional impact is poorly understood. We quantify the compression capacity of knowledge distillation and the…

Machine Learning · Computer Science 2026-03-17 Israel Mason-Williams , Gabryel Mason-Williams , Helen Yannakoudakis

LoCa: Logit Calibration for Knowledge Distillation

Knowledge Distillation (KD), aiming to train a better student model by mimicking the teacher model, plays an important role in model compression. One typical way is to align the output logits. However, we find a common issue named…

Computation and Language · Computer Science 2024-09-10 Runming Yang , Taiqiang Wu , Yujiu Yang

Knowledge Distillation Performs Partial Variance Reduction

Knowledge distillation is a popular approach for enhancing the performance of ''student'' models, with lower representational capacity, by taking advantage of more powerful ''teacher'' models. Despite its apparent simplicity and widespread…

Machine Learning · Computer Science 2023-12-12 Mher Safaryan , Alexandra Peste , Dan Alistarh

Localization Distillation for Dense Object Detection

Knowledge distillation (KD) has witnessed its powerful capability in learning compact models in object detection. Previous KD methods for object detection mostly focus on imitating deep features within the imitation regions instead of…

Computer Vision and Pattern Recognition · Computer Science 2022-04-01 Zhaohui Zheng , Rongguang Ye , Ping Wang , Dongwei Ren , Wangmeng Zuo , Qibin Hou , Ming-Ming Cheng

Decoupled Knowledge Distillation

State-of-the-art distillation methods are mainly based on distilling deep features from intermediate layers, while the significance of logit distillation is greatly overlooked. To provide a novel viewpoint to study logit distillation, we…

Computer Vision and Pattern Recognition · Computer Science 2022-07-13 Borui Zhao , Quan Cui , Renjie Song , Yiyu Qiu , Jiajun Liang

Localization Distillation for Object Detection

Previous knowledge distillation (KD) methods for object detection mostly focus on feature imitation instead of mimicking the prediction logits due to its inefficiency in distilling the localization information. In this paper, we investigate…

Computer Vision and Pattern Recognition · Computer Science 2022-12-09 Zhaohui Zheng , Rongguang Ye , Qibin Hou , Dongwei Ren , Ping Wang , Wangmeng Zuo , Ming-Ming Cheng

Dual-Head Knowledge Distillation: Enhancing Logits Utilization with an Auxiliary Head

Traditional knowledge distillation focuses on aligning the student's predicted probabilities with both ground-truth labels and the teacher's predicted probabilities. However, the transition to predicted probabilities from logits would…

Computer Vision and Pattern Recognition · Computer Science 2026-04-08 Penghui Yang , Chen-Chen Zong , Sheng-Jun Huang , Lei Feng , Bo An

Does Knowledge Distillation Really Work?

Knowledge distillation is a popular technique for training a small student network to emulate a larger teacher model, such as an ensemble of networks. We show that while knowledge distillation can improve student generalization, it does not…

Machine Learning · Computer Science 2021-12-07 Samuel Stanton , Pavel Izmailov , Polina Kirichenko , Alexander A. Alemi , Andrew Gordon Wilson

Distilling Image Classifiers in Object Detectors

Knowledge distillation constitutes a simple yet effective way to improve the performance of a compact student network by exploiting the knowledge of a more powerful teacher. Nevertheless, the knowledge distillation literature remains…

Computer Vision and Pattern Recognition · Computer Science 2022-02-11 Shuxuan Guo , Jose M. Alvarez , Mathieu Salzmann

A Closer Look at Knowledge Distillation with Features, Logits, and Gradients

Knowledge distillation (KD) is a substantial strategy for transferring learned knowledge from one neural network model to another. A vast number of methods have been developed for this strategy. While most method designs a more efficient…

Machine Learning · Computer Science 2022-03-22 Yen-Chang Hsu , James Smith , Yilin Shen , Zsolt Kira , Hongxia Jin