English
Related papers

Related papers: Progressive Class-level Distillation

200 papers

Knowledge distillation aims to transfer knowledge to the student model by utilizing the predictions/features of the teacher model, and feature-based distillation has recently shown its superiority over logit-based distillation. However, due…

Computer Vision and Pattern Recognition · Computer Science 2022-11-29 Shuoxi Zhang , Hanpeng Liu , John E. Hopcroft , Kun He

Knowledge distillation (KD) aims to distill the knowledge from the teacher (larger) to the student (smaller) model via soft-label for the efficient neural network. In general, the performance of a model is determined by accuracy, which is…

Signal Processing · Electrical Eng. & Systems 2025-08-25 Stephen Ekaputra Limantoro

Recent research on knowledge distillation has increasingly focused on logit distillation because of its simplicity, effectiveness, and versatility in model compression. In this paper, we introduce Refined Logit Distillation (RLD) to address…

Computer Vision and Pattern Recognition · Computer Science 2025-07-29 Wujie Sun , Defang Chen , Siwei Lyu , Genlang Chen , Chun Chen , Can Wang

Knowledge distillation is a model compression technique in which a compact "student" network is trained to replicate the predictive behavior of a larger "teacher" network. In logit-based knowledge distillation, it has become the de facto…

Machine Learning · Computer Science 2026-05-12 Ejafa Bassam , Dawei Zhu , Kaigui Bian

Knowledge distillation (KD) compresses the network capacity by transferring knowledge from a large (teacher) network to a smaller one (student). It has been mainstream that the teacher directly transfers knowledge to the student with its…

Knowledge Distillation (KD), a learning manner with a larger teacher network guiding a smaller student network, transfers dark knowledge from the teacher to the student via logits or intermediate features, with the aim of producing a…

Machine Learning · Computer Science 2024-12-04 Chengting Yu , Fengzhao Zhang , Ruizhe Chen , Aili Wang , Zuozhu Liu , Shurun Tan , Er-Ping Li

Knowledge distillation (KD) is an established paradigm for transferring privileged knowledge from a cumbersome model to a lightweight and efficient one. In recent years, logit-based KD methods are quickly catching up in performance with…

Computer Vision and Pattern Recognition · Computer Science 2024-12-24 Weijia Zhang , Dongnan Liu , Weidong Cai , Chao Ma

Despite the success of Deep Learning (DL), the deployment of modern DL models requiring large computational power poses a significant problem for resource-constrained systems. This necessitates building compact networks that reduce…

Machine Learning · Computer Science 2020-06-24 Akshay Kulkarni , Navid Panchi , Sharath Chandra Raparthy , Shital Chiddarwar

Logit-based knowledge distillation (KD) for classification is cost-efficient compared to feature-based KD but often subject to inferior performance. Recently, it was shown that the performance of logit-based KD can be improved by…

Computer Vision and Pattern Recognition · Computer Science 2024-09-06 Hyungkeun Park , Jong-Seok Lee

Knowledge distillation (KD) is a substantial strategy for transferring learned knowledge from one neural network model to another. A vast number of methods have been developed for this strategy. While most method designs a more efficient…

Machine Learning · Computer Science 2022-03-22 Yen-Chang Hsu , James Smith , Yilin Shen , Zsolt Kira , Hongxia Jin

Knowledge distillation (KD) is an effective framework to transfer knowledge from a large-scale teacher to a compact yet well-performing student. Previous KD practices for pre-trained language models mainly transfer knowledge by aligning…

Computation and Language · Computer Science 2022-11-03 Lean Wang , Lei Li , Xu Sun

Knowledge distillation (KD) has become an important technique for model compression and knowledge transfer. In this work, we first perform a comprehensive analysis of the knowledge transferred by different KD methods. We demonstrate that…

Computer Vision and Pattern Recognition · Computer Science 2021-06-07 Fei Ding , Yin Yang , Hongxin Hu , Venkat Krovi , Feng Luo

Knowledge distillation (KD)transfers the dark knowledge from a complex teacher to a compact student. However, heterogeneous architecture distillation, such as Vision Transformer (ViT) to ResNet18, faces challenges due to differences in…

Computer Vision and Pattern Recognition · Computer Science 2026-02-16 Liuchi Xu , Hao Zheng , Lu Wang , Lisheng Xu , Jun Cheng

Logit knowledge distillation attracts increasing attention due to its practicality in recent studies. However, it often suffers inferior performance compared to the feature knowledge distillation. In this paper, we argue that existing…

Computer Vision and Pattern Recognition · Computer Science 2024-03-21 Shicai Wei Chunbo Luo Yang Luo

We introduce the problem of continual distillation learning (CDL) in order to use knowledge distillation (KD) to improve prompt-based continual learning (CL) models. The CDL problem is valuable to study since the use of a larger vision…

Computer Vision and Pattern Recognition · Computer Science 2025-05-21 Qifan Zhang , Yunhui Guo , Yu Xiang

Knowledge distillation (KD) is a technique for transferring knowledge from complex teacher models to simpler student models, significantly enhancing model efficiency and accuracy. It has demonstrated substantial advancements in various…

Computation and Language · Computer Science 2025-04-21 Junjie Yang , Junhao Song , Xudong Han , Ziqian Bi , Tianyang Wang , Chia Xin Liang , Xinyuan Song , Yichao Zhang , Qian Niu , Benji Peng , Keyu Chen , Ming Liu

In this paper, we propose a simple yet effective contrastive knowledge distillation framework that achieves sample-wise logit alignment while preserving semantic consistency. Conventional knowledge distillation approaches exhibit…

Computer Vision and Pattern Recognition · Computer Science 2025-03-26 Wencheng Zhu , Xin Zhou , Pengfei Zhu , Yu Wang , Qinghua Hu

Knowledge distillation is a key technique for transferring the capabilities of large language models (LLMs) into smaller, more efficient student models. Existing distillation approaches often overlook two critical factors: the learning…

Machine Learning · Computer Science 2026-05-13 Jincheng Cao , Fanzhi Zeng , Leqi Liu , Aryan Mokhtari

Standard Knowledge Distillation (KD) compresses Large Language Models (LLMs) by optimizing final outputs, yet it typically treats the teacher's intermediate layer's thought process as a black box. While feature-based distillation attempts…

Computation and Language · Computer Science 2026-02-17 Manish Dhakal , Uthman Jinadu , Anjila Budathoki , Rajshekhar Sunderraman , Yi Ding

Knowledge distillation is a mainstream algorithm in model compression by transferring knowledge from the larger model (teacher) to the smaller model (student) to improve the performance of student. Despite many efforts, existing methods…

Computer Vision and Pattern Recognition · Computer Science 2024-10-21 Muhe Ding , Jianlong Wu , Xue Dong , Xiaojie Li , Pengda Qin , Tian Gan , Liqiang Nie
‹ Prev 1 2 3 10 Next ›