Related papers: Progressive Class-level Distillation

Class-aware Information for Logit-based Knowledge Distillation

Knowledge distillation aims to transfer knowledge to the student model by utilizing the predictions/features of the teacher model, and feature-based distillation has recently shown its superiority over logit-based distillation. However, due…

Computer Vision and Pattern Recognition · Computer Science 2022-11-29 Shuoxi Zhang , Hanpeng Liu , John E. Hopcroft , Kun He

Parameter-Free Logit Distillation via Sorting Mechanism

Knowledge distillation (KD) aims to distill the knowledge from the teacher (larger) to the student (smaller) model via soft-label for the efficient neural network. In general, the performance of a model is determined by accuracy, which is…

Signal Processing · Electrical Eng. & Systems 2025-08-25 Stephen Ekaputra Limantoro

Knowledge Distillation with Refined Logits

Recent research on knowledge distillation has increasingly focused on logit distillation because of its simplicity, effectiveness, and versatility in model compression. In this paper, we introduce Refined Logit Distillation (RLD) to address…

Computer Vision and Pattern Recognition · Computer Science 2025-07-29 Wujie Sun , Defang Chen , Siwei Lyu , Genlang Chen , Chun Chen , Can Wang

PLD: A Choice-Theoretic List-Wise Knowledge Distillation

Knowledge distillation is a model compression technique in which a compact "student" network is trained to replicate the predictive behavior of a larger "teacher" network. In logit-based knowledge distillation, it has become the de facto…

Machine Learning · Computer Science 2026-05-12 Ejafa Bassam , Dawei Zhu , Kaigui Bian

Swapped Logit Distillation via Bi-level Teacher Alignment

Knowledge distillation (KD) compresses the network capacity by transferring knowledge from a large (teacher) network to a smaller one (student). It has been mainstream that the teacher directly transfers knowledge to the student with its…

Machine Learning · Computer Science 2025-05-26 Stephen Ekaputra Limantoro , Jhe-Hao Lin , Chih-Yu Wang , Yi-Lung Tsai , Hong-Han Shuai , Ching-Chun Huang , Wen-Huang Cheng

Decoupling Dark Knowledge via Block-wise Logit Distillation for Feature-level Alignment

Knowledge Distillation (KD), a learning manner with a larger teacher network guiding a smaller student network, transfers dark knowledge from the teacher to the student via logits or intermediate features, with the aim of producing a…

Machine Learning · Computer Science 2024-12-04 Chengting Yu , Fengzhao Zhang , Ruizhe Chen , Aili Wang , Zuozhu Liu , Shurun Tan , Er-Ping Li

Cross-View Consistency Regularisation for Knowledge Distillation

Knowledge distillation (KD) is an established paradigm for transferring privileged knowledge from a cumbersome model to a lightweight and efficient one. In recent years, logit-based KD methods are quickly catching up in performance with…

Computer Vision and Pattern Recognition · Computer Science 2024-12-24 Weijia Zhang , Dongnan Liu , Weidong Cai , Chao Ma

Data Efficient Stagewise Knowledge Distillation

Despite the success of Deep Learning (DL), the deployment of modern DL models requiring large computational power poses a significant problem for resource-constrained systems. This necessitates building compact networks that reduce…

Machine Learning · Computer Science 2020-06-24 Akshay Kulkarni , Navid Panchi , Sharath Chandra Raparthy , Shital Chiddarwar

Adaptive Explicit Knowledge Transfer for Knowledge Distillation

Logit-based knowledge distillation (KD) for classification is cost-efficient compared to feature-based KD but often subject to inferior performance. Recently, it was shown that the performance of logit-based KD can be improved by…

Computer Vision and Pattern Recognition · Computer Science 2024-09-06 Hyungkeun Park , Jong-Seok Lee

A Closer Look at Knowledge Distillation with Features, Logits, and Gradients

Knowledge distillation (KD) is a substantial strategy for transferring learned knowledge from one neural network model to another. A vast number of methods have been developed for this strategy. While most method designs a more efficient…

Machine Learning · Computer Science 2022-03-22 Yen-Chang Hsu , James Smith , Yilin Shen , Zsolt Kira , Hongxia Jin

Gradient Knowledge Distillation for Pre-trained Language Models

Knowledge distillation (KD) is an effective framework to transfer knowledge from a large-scale teacher to a compact yet well-performing student. Previous KD practices for pre-trained language models mainly transfer knowledge by aligning…

Computation and Language · Computer Science 2022-11-03 Lean Wang , Lei Li , Xu Sun

Multi-level Knowledge Distillation via Knowledge Alignment and Correlation

Knowledge distillation (KD) has become an important technique for model compression and knowledge transfer. In this work, we first perform a comprehensive analysis of the knowledge transferred by different KD methods. We demonstrate that…

Computer Vision and Pattern Recognition · Computer Science 2021-06-07 Fei Ding , Yin Yang , Hongxin Hu , Venkat Krovi , Feng Luo

Heterogeneous Complementary Distillation

Knowledge distillation (KD)transfers the dark knowledge from a complex teacher to a compact student. However, heterogeneous architecture distillation, such as Vision Transformer (ViT) to ResNet18, faces challenges due to differences in…

Computer Vision and Pattern Recognition · Computer Science 2026-02-16 Liuchi Xu , Hao Zheng , Lu Wang , Lisheng Xu , Jun Cheng

Scale Decoupled Distillation

Logit knowledge distillation attracts increasing attention due to its practicality in recent studies. However, it often suffers inferior performance compared to the feature knowledge distillation. In this paper, we argue that existing…

Computer Vision and Pattern Recognition · Computer Science 2024-03-21 Shicai Wei Chunbo Luo Yang Luo

Continual Distillation Learning: Knowledge Distillation in Prompt-based Continual Learning

We introduce the problem of continual distillation learning (CDL) in order to use knowledge distillation (KD) to improve prompt-based continual learning (CL) models. The CDL problem is valuable to study since the use of a larger vision…

Computer Vision and Pattern Recognition · Computer Science 2025-05-21 Qifan Zhang , Yunhui Guo , Yu Xiang

Feature Alignment and Representation Transfer in Knowledge Distillation for Large Language Models

Knowledge distillation (KD) is a technique for transferring knowledge from complex teacher models to simpler student models, significantly enhancing model efficiency and accuracy. It has demonstrated substantial advancements in various…

Computation and Language · Computer Science 2025-04-21 Junjie Yang , Junhao Song , Xudong Han , Ziqian Bi , Tianyang Wang , Chia Xin Liang , Xinyuan Song , Yichao Zhang , Qian Niu , Benji Peng , Keyu Chen , Ming Liu

CKD: Contrastive Knowledge Distillation from A Sample-wise Perspective

In this paper, we propose a simple yet effective contrastive knowledge distillation framework that achieves sample-wise logit alignment while preserving semantic consistency. Conventional knowledge distillation approaches exhibit…

Computer Vision and Pattern Recognition · Computer Science 2025-03-26 Wencheng Zhu , Xin Zhou , Pengfei Zhu , Yu Wang , Qinghua Hu

Curriculum Learning-Guided Progressive Distillation in Large Language Models

Knowledge distillation is a key technique for transferring the capabilities of large language models (LLMs) into smaller, more efficient student models. Existing distillation approaches often overlook two critical factors: the learning…

Machine Learning · Computer Science 2026-05-13 Jincheng Cao , Fanzhi Zeng , Leqi Liu , Aryan Mokhtari

DistillLens: Symmetric Knowledge Distillation Through Logit Lens

Standard Knowledge Distillation (KD) compresses Large Language Models (LLMs) by optimizing final outputs, yet it typically treats the teacher's intermediate layer's thought process as a black box. While feature-based distillation attempts…

Computation and Language · Computer Science 2026-02-17 Manish Dhakal , Uthman Jinadu , Anjila Budathoki , Rajshekhar Sunderraman , Yi Ding

Preview-based Category Contrastive Learning for Knowledge Distillation

Knowledge distillation is a mainstream algorithm in model compression by transferring knowledge from the larger model (teacher) to the smaller model (student) to improve the performance of student. Despite many efforts, existing methods…

Computer Vision and Pattern Recognition · Computer Science 2024-10-21 Muhe Ding , Jianlong Wu , Xue Dong , Xiaojie Li , Pengda Qin , Tian Gan , Liqiang Nie