Regularizing Class-wise Predictions via Self-knowledge Distillation

Sukmin Yun; Jongjin Park; Kimin Lee; Jinwoo Shin

Regularizing Class-wise Predictions via Self-knowledge Distillation

Machine Learning 2020-04-08 v2 Computer Vision and Pattern Recognition Machine Learning

Authors: Sukmin Yun , Jongjin Park , Kimin Lee , Jinwoo Shin

Abstract

Deep neural networks with millions of parameters may suffer from poor generalization due to overfitting. To mitigate the issue, we propose a new regularization method that penalizes the predictive distribution between similar samples. In particular, we distill the predictive distribution between different samples of the same label during training. This results in regularizing the dark knowledge (i.e., the knowledge on wrong predictions) of a single network (i.e., a self-knowledge distillation) by forcing it to produce more meaningful and consistent predictions in a class-wise manner. Consequently, it mitigates overconfident predictions and reduces intra-class variations. Our experimental results on various image classification tasks demonstrate that the simple yet powerful method can significantly improve not only the generalization ability but also the calibration performance of modern convolutional neural networks.

Keywords

knowledge distillation regularization generalization in machine learning

Cite

@article{arxiv.2003.13964,
  title  = {Regularizing Class-wise Predictions via Self-knowledge Distillation},
  author = {Sukmin Yun and Jongjin Park and Kimin Lee and Jinwoo Shin},
  journal= {arXiv preprint arXiv:2003.13964},
  year   = {2020}
}

Comments

Accepted to CVPR 2020. Code is available at https://github.com/alinlab/cs-kd

Regularizing Class-wise Predictions via Self-knowledge Distillation

Abstract

Keywords

Cite

Comments

Related papers