This research note provides a quick introduction to the knowledge distillation loss function used in object classification. In particular, we discuss its connection to a previously proposed logits matching loss function. We further treat knowledge distillation as a specific form of output regularization and demonstrate its connection to label smoothing and entropy-based regularization.
Cite
@article{arxiv.2109.06458,
title = {A Note on Knowledge Distillation Loss Function for Object Classification},
author = {Defang Chen},
journal= {arXiv preprint arXiv:2109.06458},
year = {2024}
}