Related papers: Multi-granularity for knowledge distillation

Multi-Modality Distillation via Learning the teacher's modality-level Gram Matrix

In the context of multi-modality knowledge distillation research, the existing methods was mainly focus on the problem of only learning teacher final output. Thus, there are still deep differences between the teacher network and the student…

Artificial Intelligence · Computer Science 2026-05-08 Peng Liu

Student Network Learning via Evolutionary Knowledge Distillation

Knowledge distillation provides an effective way to transfer knowledge via teacher-student learning, where most existing distillation approaches apply a fixed pre-trained model as teacher to supervise the learning of student network. This…

Machine Learning · Computer Science 2021-03-26 Kangkai Zhang , Chunhui Zhang , Shikun Li , Dan Zeng , Shiming Ge

Learning Student-Friendly Teacher Networks for Knowledge Distillation

We propose a novel knowledge distillation approach to facilitate the transfer of dark knowledge from a teacher to a student. Contrary to most of the existing methods that rely on effective training of student models given pretrained…

Machine Learning · Computer Science 2022-01-25 Dae Young Park , Moon-Hyun Cha , Changwook Jeong , Dae Sin Kim , Bohyung Han

Fixing the Teacher-Student Knowledge Discrepancy in Distillation

Training a small student network with the guidance of a larger teacher network is an effective way to promote the performance of the student. Despite the different types, the guided knowledge used to distill is always kept unchanged for…

Computer Vision and Pattern Recognition · Computer Science 2021-04-01 Jiangfan Han , Mengya Gao , Yujie Wang , Quanquan Li , Hongsheng Li , Xiaogang Wang

Improved Knowledge Distillation for Pre-trained Language Models via Knowledge Selection

Knowledge distillation addresses the problem of transferring knowledge from a teacher model to a student model. In this process, we typically have multiple types of knowledge extracted from the teacher model. The problem is to make full use…

Computation and Language · Computer Science 2023-02-02 Chenglong Wang , Yi Lu , Yongyu Mu , Yimin Hu , Tong Xiao , Jingbo Zhu

What Knowledge Gets Distilled in Knowledge Distillation?

Knowledge distillation aims to transfer useful information from a teacher network to a student network, with the primary goal of improving the student's performance for the task at hand. Over the years, there has a been a deluge of novel…

Computer Vision and Pattern Recognition · Computer Science 2023-11-07 Utkarsh Ojha , Yuheng Li , Anirudh Sundara Rajan , Yingyu Liang , Yong Jae Lee

Knowledge distillation is a widely applicable technique for training a student neural network under the guidance of a trained teacher network. For example, in neural network compression, a high-capacity teacher is distilled to train a…

Computer Vision and Pattern Recognition · Computer Science 2019-08-05 Frederick Tung , Greg Mori

Efficient Knowledge Distillation via Curriculum Extraction

Knowledge distillation is a technique used to train a small student network using the output generated by a large teacher network, and has many empirical advantages~\citep{Hinton2015DistillingTK}. While the standard one-shot approach to…

Machine Learning · Computer Science 2025-03-25 Shivam Gupta , Sushrut Karmalkar

A Survey on Recent Teacher-student Learning Studies

Knowledge distillation is a method of transferring the knowledge from a complex deep neural network (DNN) to a smaller and faster DNN, while preserving its accuracy. Recent variants of knowledge distillation include teaching assistant…

Machine Learning · Computer Science 2023-04-11 Minghong Gao

Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation

Convolutional neural networks have been widely deployed in various application scenarios. In order to extend the applications' boundaries to some accuracy-crucial domains, researchers have been investigating approaches to boost accuracy…

Machine Learning · Computer Science 2019-05-21 Linfeng Zhang , Jiebo Song , Anni Gao , Jingwei Chen , Chenglong Bao , Kaisheng Ma

Densely Guided Knowledge Distillation using Multiple Teacher Assistants

With the success of deep neural networks, knowledge distillation which guides the learning of a small student network from a large teacher network is being actively studied for model compression and transfer learning. However, few studies…

Computer Vision and Pattern Recognition · Computer Science 2021-08-10 Wonchul Son , Jaemin Na , Junyong Choi , Wonjun Hwang

Does Knowledge Distillation Really Work?

Knowledge distillation is a popular technique for training a small student network to emulate a larger teacher model, such as an ensemble of networks. We show that while knowledge distillation can improve student generalization, it does not…

Machine Learning · Computer Science 2021-12-07 Samuel Stanton , Pavel Izmailov , Polina Kirichenko , Alexander A. Alemi , Andrew Gordon Wilson

Uncertainty-Aware Dual-Student Knowledge Distillation for Efficient Image Classification

Knowledge distillation has emerged as a powerful technique for model compression, enabling the transfer of knowledge from large teacher networks to compact student models. However, traditional knowledge distillation methods treat all…

Computer Vision and Pattern Recognition · Computer Science 2025-11-25 Aakash Gore , Anoushka Dey , Aryan Mishra

Gradient Knowledge Distillation for Pre-trained Language Models

Knowledge distillation (KD) is an effective framework to transfer knowledge from a large-scale teacher to a compact yet well-performing student. Previous KD practices for pre-trained language models mainly transfer knowledge by aligning…

Computation and Language · Computer Science 2022-11-03 Lean Wang , Lei Li , Xu Sun

Extracurricular Learning: Knowledge Transfer Beyond Empirical Distribution

Knowledge distillation has been used to transfer knowledge learned by a sophisticated model (teacher) to a simpler model (student). This technique is widely used to compress model complexity. However, in most applications the compressed…

Machine Learning · Computer Science 2020-11-24 Hadi Pouransari , Mojan Javaheripi , Vinay Sharma , Oncel Tuzel

Reinforced Multi-Teacher Selection for Knowledge Distillation

In natural language processing (NLP) tasks, slow inference speed and huge footprints in GPU usage remain the bottleneck of applying pre-trained deep models in production. As a popular method for model compression, knowledge distillation…

Computation and Language · Computer Science 2020-12-15 Fei Yuan , Linjun Shou , Jian Pei , Wutao Lin , Ming Gong , Yan Fu , Daxin Jiang

Variational Information Distillation for Knowledge Transfer

Transferring knowledge from a teacher neural network pretrained on the same or a similar task to a student neural network can significantly improve the performance of the student neural network. Existing knowledge transfer approaches match…

Computer Vision and Pattern Recognition · Computer Science 2019-04-12 Sungsoo Ahn , Shell Xu Hu , Andreas Damianou , Neil D. Lawrence , Zhenwen Dai

Knowledge Distillation Meets Self-Supervision

Knowledge distillation, which involves extracting the "dark knowledge" from a teacher network to guide the learning of a student network, has emerged as an important technique for model compression and transfer learning. Unlike previous…

Computer Vision and Pattern Recognition · Computer Science 2020-07-14 Guodong Xu , Ziwei Liu , Xiaoxiao Li , Chen Change Loy

Supervision Complexity and its Role in Knowledge Distillation

Despite the popularity and efficacy of knowledge distillation, there is limited understanding of why it helps. In order to study the generalization behavior of a distilled student, we propose a new theoretical framework that leverages…

Machine Learning · Computer Science 2023-01-31 Hrayr Harutyunyan , Ankit Singh Rawat , Aditya Krishna Menon , Seungyeon Kim , Sanjiv Kumar

On effects of Knowledge Distillation on Transfer Learning

Knowledge distillation is a popular machine learning technique that aims to transfer knowledge from a large 'teacher' network to a smaller 'student' network and improve the student's performance by training it to emulate the teacher. In…

Machine Learning · Computer Science 2022-10-19 Sushil Thapa