Related papers: Multi-Label Knowledge Distillation

Knowledge Distillation from Single to Multi Labels: an Empirical Study

Knowledge distillation (KD) has been extensively studied in single-label image classification. However, its efficacy for multi-label classification remains relatively unexplored. In this study, we firstly investigate the effectiveness of…

Computer Vision and Pattern Recognition · Computer Science 2023-03-16 Youcai Zhang , Yuzhuo Qin , Hengwei Liu , Yanhao Zhang , Yaqian Li , Xiaodong Gu

Selective Cross-Task Distillation

The outpouring of various pre-trained models empowers knowledge distillation by providing abundant teacher resources, but there lacks a developed mechanism to utilize these teachers adequately. With a massive model repository composed of…

Machine Learning · Computer Science 2022-09-29 Su Lu , Han-Jia Ye , De-Chuan Zhan

Knowledge Distillation with Refined Logits

Recent research on knowledge distillation has increasingly focused on logit distillation because of its simplicity, effectiveness, and versatility in model compression. In this paper, we introduce Refined Logit Distillation (RLD) to address…

Computer Vision and Pattern Recognition · Computer Science 2025-07-29 Wujie Sun , Defang Chen , Siwei Lyu , Genlang Chen , Chun Chen , Can Wang

Make a Strong Teacher with Label Assistance: A Novel Knowledge Distillation Approach for Semantic Segmentation

In this paper, we introduce a novel knowledge distillation approach for the semantic segmentation task. Unlike previous methods that rely on power-trained teachers or other modalities to provide additional knowledge, our approach does not…

Computer Vision and Pattern Recognition · Computer Science 2024-07-19 Shoumeng Qiu , Jie Chen , Xinrun Li , Ru Wan , Xiangyang Xue , Jian Pu

Unified and Effective Ensemble Knowledge Distillation

Ensemble knowledge distillation can extract knowledge from multiple teacher models and encode it into a single student model. Many existing methods learn and distill the student model on labeled data only. However, the teacher models are…

Machine Learning · Computer Science 2022-04-04 Chuhan Wu , Fangzhao Wu , Tao Qi , Yongfeng Huang

The Staged Knowledge Distillation in Video Classification: Harmonizing Student Progress by a Complementary Weakly Supervised Framework

In the context of label-efficient learning on video data, the distillation method and the structural design of the teacher-student architecture have a significant impact on knowledge distillation. However, the relationship between these…

Computer Vision and Pattern Recognition · Computer Science 2023-07-18 Chao Wang , Zheng Tang

PILE: Pairwise Iterative Logits Ensemble for Multi-Teacher Labeled Distillation

Pre-trained language models have become a crucial part of ranking systems and achieved very impressive effects recently. To maintain high performance while keeping efficient computations, knowledge distillation is widely used. In this…

Information Retrieval · Computer Science 2022-11-14 Lianshang Cai , Linhao Zhang , Dehong Ma , Jun Fan , Daiting Shi , Yi Wu , Zhicong Cheng , Simiu Gu , Dawei Yin

Hierarchical Knowledge Distillation for Dialogue Sequence Labeling

This paper presents a novel knowledge distillation method for dialogue sequence labeling. Dialogue sequence labeling is a supervised learning task that estimates labels for each utterance in the target dialogue document, and is useful for…

Computation and Language · Computer Science 2021-11-23 Shota Orihashi , Yoshihiro Yamazaki , Naoki Makishima , Mana Ihori , Akihiko Takashima , Tomohiro Tanaka , Ryo Masumura

Class-aware Information for Logit-based Knowledge Distillation

Knowledge distillation aims to transfer knowledge to the student model by utilizing the predictions/features of the teacher model, and feature-based distillation has recently shown its superiority over logit-based distillation. However, due…

Computer Vision and Pattern Recognition · Computer Science 2022-11-29 Shuoxi Zhang , Hanpeng Liu , John E. Hopcroft , Kun He

Self-Knowledge Distillation for Learning Ambiguity

Recent language models have shown remarkable performance on natural language understanding (NLU) tasks. However, they are often sub-optimal when faced with ambiguous samples that can be interpreted in multiple ways, over-confidently…

Computation and Language · Computer Science 2024-06-17 Hancheol Park , Soyeong Jeong , Sukmin Cho , Jong C. Park

M2KD: Multi-model and Multi-level Knowledge Distillation for Incremental Learning

Incremental learning targets at achieving good performance on new categories without forgetting old ones. Knowledge distillation has been shown critical in preserving the performance on old classes. Conventional methods, however,…

Computer Vision and Pattern Recognition · Computer Science 2020-09-08 Peng Zhou , Long Mai , Jianming Zhang , Ning Xu , Zuxuan Wu , Larry S. Davis

Knowledge Distillation from Internal Representations

Knowledge distillation is typically conducted by training a small model (the student) to mimic a large and cumbersome model (the teacher). The idea is to compress the knowledge from the teacher by using its output probabilities as…

Computation and Language · Computer Science 2020-01-17 Gustavo Aguilar , Yuan Ling , Yu Zhang , Benjamin Yao , Xing Fan , Chenlei Guo

Explicit and Implicit Knowledge Distillation via Unlabeled Data

Data-free knowledge distillation is a challenging model lightweight task for scenarios in which the original dataset is not available. Previous methods require a lot of extra computational costs to update one or more generators and their…

Computer Vision and Pattern Recognition · Computer Science 2023-02-24 Yuzheng Wang , Zuhao Ge , Zhaoyu Chen , Xian Liu , Chuangjia Ma , Yunquan Sun , Lizhe Qi

Is Label Smoothing Truly Incompatible with Knowledge Distillation: An Empirical Study

This work aims to empirically clarify a recently discovered perspective that label smoothing is incompatible with knowledge distillation. We begin by introducing the motivation behind on how this incompatibility is raised, i.e., label…

Machine Learning · Computer Science 2021-04-02 Zhiqiang Shen , Zechun Liu , Dejia Xu , Zitian Chen , Kwang-Ting Cheng , Marios Savvides

Classification of Diabetic Retinopathy Using Unlabeled Data and Knowledge Distillation

Knowledge distillation allows transferring knowledge from a pre-trained model to another. However, it suffers from limitations, and constraints related to the two models need to be architecturally similar. Knowledge distillation addresses…

Image and Video Processing · Electrical Eng. & Systems 2020-09-03 Sajjad Abbasi , Mohsen Hajabdollahi , Pejman Khadivi , Nader Karimi , Roshanak Roshandel , Shahram Shirani , Shadrokh Samavi

Can a student Large Language Model perform as well as it's teacher?

The burgeoning complexity of contemporary deep learning models, while achieving unparalleled accuracy, has inadvertently introduced deployment challenges in resource-constrained environments. Knowledge distillation, a technique aiming to…

Machine Learning · Computer Science 2023-10-05 Sia Gholami , Marwan Omar

Multi-Label Image Classification via Knowledge Distillation from Weakly-Supervised Detection

Multi-label image classification is a fundamental but challenging task towards general visual understanding. Existing methods found the region-level cues (e.g., features from RoIs) can facilitate multi-label classification. Nevertheless,…

Computer Vision and Pattern Recognition · Computer Science 2019-02-22 Yongcheng Liu , Lu Sheng , Jing Shao , Junjie Yan , Shiming Xiang , Chunhong Pan

Label driven Knowledge Distillation for Federated Learning with non-IID Data

In real-world applications, Federated Learning (FL) meets two challenges: (1) scalability, especially when applied to massive IoT networks; and (2) how to be robust against an environment with heterogeneous data. Realizing the first…

Machine Learning · Computer Science 2022-10-03 Minh-Duong Nguyen , Quoc-Viet Pham , Dinh Thai Hoang , Long Tran-Thanh , Diep N. Nguyen , Won-Joo Hwang

Parameter-Free Logit Distillation via Sorting Mechanism

Knowledge distillation (KD) aims to distill the knowledge from the teacher (larger) to the student (smaller) model via soft-label for the efficient neural network. In general, the performance of a model is determined by accuracy, which is…

Signal Processing · Electrical Eng. & Systems 2025-08-25 Stephen Ekaputra Limantoro

Scale Decoupled Distillation

Logit knowledge distillation attracts increasing attention due to its practicality in recent studies. However, it often suffers inferior performance compared to the feature knowledge distillation. In this paper, we argue that existing…

Computer Vision and Pattern Recognition · Computer Science 2024-03-21 Shicai Wei Chunbo Luo Yang Luo