English

Deep Mutual Learning

Computer Vision and Pattern Recognition 2017-06-02 v1

Abstract

Model distillation is an effective and widely used technique to transfer knowledge from a teacher to a student network. The typical application is to transfer from a powerful large network or ensemble to a small network, that is better suited to low-memory or fast execution requirements. In this paper, we present a deep mutual learning (DML) strategy where, rather than one way transfer between a static pre-defined teacher and a student, an ensemble of students learn collaboratively and teach each other throughout the training process. Our experiments show that a variety of network architectures benefit from mutual learning and achieve compelling results on CIFAR-100 recognition and Market-1501 person re-identification benchmarks. Surprisingly, it is revealed that no prior powerful teacher network is necessary -- mutual learning of a collection of simple student networks works, and moreover outperforms distillation from a more powerful yet static teacher.

Keywords

Cite

@article{arxiv.1706.00384,
  title  = {Deep Mutual Learning},
  author = {Ying Zhang and Tao Xiang and Timothy M. Hospedales and Huchuan Lu},
  journal= {arXiv preprint arXiv:1706.00384},
  year   = {2017}
}

Comments

10 pages, 4 figures