English

Relational Knowledge Distillation

Computer Vision and Pattern Recognition 2019-05-02 v2 Machine Learning

Abstract

Knowledge distillation aims at transferring knowledge acquired in one model (a teacher) to another model (a student) that is typically smaller. Previous approaches can be expressed as a form of training the student to mimic output activations of individual data examples represented by the teacher. We introduce a novel approach, dubbed relational knowledge distillation (RKD), that transfers mutual relations of data examples instead. For concrete realizations of RKD, we propose distance-wise and angle-wise distillation losses that penalize structural differences in relations. Experiments conducted on different tasks show that the proposed method improves educated student models with a significant margin. In particular for metric learning, it allows students to outperform their teachers' performance, achieving the state of the arts on standard benchmark datasets.

Keywords

Cite

@article{arxiv.1904.05068,
  title  = {Relational Knowledge Distillation},
  author = {Wonpyo Park and Dongju Kim and Yan Lu and Minsu Cho},
  journal= {arXiv preprint arXiv:1904.05068},
  year   = {2019}
}

Comments

Accepted to CVPR 2019