Related papers: Peer Collaborative Learning for Online Knowledge D…

Decoupled Knowledge with Ensemble Learning for Online Distillation

Offline distillation is a two-stage pipeline that requires expensive resources to train a teacher network and then distill the knowledge to a student for deployment. Online knowledge distillation, on the other hand, is a one-stage strategy…

Computer Vision and Pattern Recognition · Computer Science 2023-12-19 Baitan Shao , Ying Chen

Online Knowledge Distillation with Diverse Peers

Distillation is an effective knowledge-transfer technique that uses predicted distributions of a powerful teacher model as soft targets to train a less-parameterized student model. A pre-trained high capacity teacher, however, is not always…

Machine Learning · Computer Science 2019-12-06 Defang Chen , Jian-Ping Mei , Can Wang , Yan Feng , Chun Chen

Student Network Learning via Evolutionary Knowledge Distillation

Knowledge distillation provides an effective way to transfer knowledge via teacher-student learning, where most existing distillation approaches apply a fixed pre-trained model as teacher to supervise the learning of student network. This…

Machine Learning · Computer Science 2021-03-26 Kangkai Zhang , Chunhui Zhang , Shikun Li , Dan Zeng , Shiming Ge

Augmenting Knowledge Distillation With Peer-To-Peer Mutual Learning For Model Compression

Knowledge distillation (KD) is an effective model compression technique where a compact student network is taught to mimic the behavior of a complex and highly trained teacher network. In contrast, Mutual Learning (ML) provides an…

Computer Vision and Pattern Recognition · Computer Science 2021-10-25 Usma Niyaz , Deepti R. Bathula

Heterogeneous-Branch Collaborative Learning for Dialogue Generation

With the development of deep learning, advanced dialogue generation methods usually require a greater amount of computational resources. One promising approach to obtaining a high-performance and lightweight model is knowledge distillation,…

Computation and Language · Computer Science 2023-03-22 Yiwei Li , Shaoxiong Feng , Bin Sun , Kan Li

Semi-Online Knowledge Distillation

Knowledge distillation is an effective and stable method for model compression via knowledge transfer. Conventional knowledge distillation (KD) is to transfer knowledge from a large and well pre-trained teacher network to a small student…

Computer Vision and Pattern Recognition · Computer Science 2021-11-24 Zhiqiang Liu , Yanxia Liu , Chengkai Huang

Distilling Knowledge via Intermediate Classifiers

The crux of knowledge distillation is to effectively train a resource-limited student model with the guide of a pre-trained larger teacher model. However, when there is a large difference between the model complexities of teacher and…

Machine Learning · Computer Science 2021-06-01 Aryan Asadian , Amirali Salehi-Abari

ORC: Network Group-based Knowledge Distillation using Online Role Change

In knowledge distillation, since a single, omnipotent teacher network cannot solve all problems, multiple teacher-based knowledge distillations have been studied recently. However, sometimes their improvements are not as good as expected…

Machine Learning · Computer Science 2023-08-09 Junyong Choi , Hyeon Cho , Seokhwa Cheung , Wonjun Hwang

Knowledge Distillation by On-the-Fly Native Ensemble

Knowledge distillation is effective to train small and generalisable network models for meeting the low-memory and fast running requirements. Existing offline distillation methods rely on a strong pre-trained teacher, which enables…

Computer Vision and Pattern Recognition · Computer Science 2018-09-11 Xu Lan , Xiatian Zhu , Shaogang Gong

Highlight Every Step: Knowledge Distillation via Collaborative Teaching

High storage and computational costs obstruct deep neural networks to be deployed on resource-constrained devices. Knowledge distillation aims to train a compact student network by transferring knowledge from a larger pre-trained teacher…

Computer Vision and Pattern Recognition · Computer Science 2025-10-01 Haoran Zhao , Xin Sun , Junyu Dong , Changrui Chen , Zihe Dong

Efficient Knowledge Distillation via Curriculum Extraction

Knowledge distillation is a technique used to train a small student network using the output generated by a large teacher network, and has many empirical advantages~\citep{Hinton2015DistillingTK}. While the standard one-shot approach to…

Machine Learning · Computer Science 2025-03-25 Shivam Gupta , Sushrut Karmalkar

Densely Guided Knowledge Distillation using Multiple Teacher Assistants

With the success of deep neural networks, knowledge distillation which guides the learning of a small student network from a large teacher network is being actively studied for model compression and transfer learning. However, few studies…

Computer Vision and Pattern Recognition · Computer Science 2021-08-10 Wonchul Son , Jaemin Na , Junyong Choi , Wonjun Hwang

Collaborative Multi-Teacher Knowledge Distillation for Learning Low Bit-width Deep Neural Networks

Knowledge distillation which learns a lightweight student model by distilling knowledge from a cumbersome teacher model is an attractive approach for learning compact deep neural networks (DNNs). Recent works further improve student network…

Computer Vision and Pattern Recognition · Computer Science 2022-10-31 Cuong Pham , Tuan Hoang , Thanh-Toan Do

Knowledge Distillation via Instance-level Sequence Learning

Recently, distillation approaches are suggested to extract general knowledge from a teacher network to guide a student network. Most of the existing methods transfer knowledge from the teacher network to the student via feeding the sequence…

Computer Vision and Pattern Recognition · Computer Science 2021-06-22 Haoran Zhao , Xin Sun , Junyu Dong , Zihe Dong , Qiong Li

Deep Mutual Learning

Model distillation is an effective and widely used technique to transfer knowledge from a teacher to a student network. The typical application is to transfer from a powerful large network or ensemble to a small network, that is better…

Computer Vision and Pattern Recognition · Computer Science 2017-06-02 Ying Zhang , Tao Xiang , Timothy M. Hospedales , Huchuan Lu

Cooperative Knowledge Distillation: A Learner Agnostic Approach

Knowledge distillation is a simple but powerful way to transfer knowledge between a teacher model to a student model. Existing work suffers from at least one of the following key limitations in terms of direction and scope of transfer which…

Machine Learning · Computer Science 2024-02-12 Michael Livanos , Ian Davidson , Stephen Wong

Online Knowledge Distillation via Multi-branch Diversity Enhancement

Knowledge distillation is an effective method to transfer the knowledge from the cumbersome teacher model to the lightweight student model. Online knowledge distillation uses the ensembled prediction results of multiple student models as…

Computer Vision and Pattern Recognition · Computer Science 2020-11-16 Zheng Li , Ying Huang , Defang Chen , Tianren Luo , Ning Cai , Zhigeng Pan

Knowledge Distillation via Weighted Ensemble of Teaching Assistants

Knowledge distillation in machine learning is the process of transferring knowledge from a large model called the teacher to a smaller model called the student. Knowledge distillation is one of the techniques to compress the large network…

Machine Learning · Computer Science 2022-06-27 Durga Prasad Ganta , Himel Das Gupta , Victor S. Sheng

Transfer Heterogeneous Knowledge Among Peer-to-Peer Teammates: A Model Distillation Approach

Peer-to-peer knowledge transfer in distributed environments has emerged as a promising method since it could accelerate learning and improve team-wide performance without relying on pre-trained teachers in deep reinforcement learning.…

Artificial Intelligence · Computer Science 2020-02-07 Zeyue Xue , Shuang Luo , Chao Wu , Pan Zhou , Kaigui Bian , Wei Du

Channel Self-Supervision for Online Knowledge Distillation

Recently, researchers have shown an increased interest in the online knowledge distillation. Adopting an one-stage and end-to-end training fashion, online knowledge distillation uses aggregated intermediated predictions of multiple peer…

Computer Vision and Pattern Recognition · Computer Science 2022-03-24 Shixiao Fan , Xuan Cheng , Xiaomin Wang , Chun Yang , Pan Deng , Minghui Liu , Jiali Deng , Ming Liu