English

Model Compression Using Optimal Transport

Computer Vision and Pattern Recognition 2020-12-08 v1 Machine Learning

Abstract

Model compression methods are important to allow for easier deployment of deep learning models in compute, memory and energy-constrained environments such as mobile phones. Knowledge distillation is a class of model compression algorithm where knowledge from a large teacher network is transferred to a smaller student network thereby improving the student's performance. In this paper, we show how optimal transport-based loss functions can be used for training a student network which encourages learning student network parameters that help bring the distribution of student features closer to that of the teacher features. We present image classification results on CIFAR-100, SVHN and ImageNet and show that the proposed optimal transport loss functions perform comparably to or better than other loss functions.

Keywords

Cite

@article{arxiv.2012.03907,
  title  = {Model Compression Using Optimal Transport},
  author = {Suhas Lohit and Michael Jones},
  journal= {arXiv preprint arXiv:2012.03907},
  year   = {2020}
}
R2 v1 2026-06-23T20:47:30.016Z