English

Multi-Label Knowledge Distillation

Machine Learning 2025-06-02 v1 Artificial Intelligence Computer Vision and Pattern Recognition

Abstract

Existing knowledge distillation methods typically work by imparting the knowledge of output logits or intermediate feature maps from the teacher network to the student network, which is very successful in multi-class single-label learning. However, these methods can hardly be extended to the multi-label learning scenario, where each instance is associated with multiple semantic labels, because the prediction probabilities do not sum to one and feature maps of the whole example may ignore minor classes in such a scenario. In this paper, we propose a novel multi-label knowledge distillation method. On one hand, it exploits the informative semantic knowledge from the logits by dividing the multi-label learning problem into a set of binary classification problems; on the other hand, it enhances the distinctiveness of the learned feature representations by leveraging the structural information of label-wise embeddings. Experimental results on multiple benchmark datasets validate that the proposed method can avoid knowledge counteraction among labels, thus achieving superior performance against diverse comparing methods. Our code is available at: https://github.com/penghui-yang/L2D

Keywords

Cite

@article{arxiv.2308.06453,
  title  = {Multi-Label Knowledge Distillation},
  author = {Penghui Yang and Ming-Kun Xie and Chen-Chen Zong and Lei Feng and Gang Niu and Masashi Sugiyama and Sheng-Jun Huang},
  journal= {arXiv preprint arXiv:2308.06453},
  year   = {2025}
}

Comments

Accepted by ICCV 2023. The first two authors contributed equally to this work

R2 v1 2026-06-28T11:54:08.593Z