Related papers: Explaining Knowledge Distillation by Quantifying t…

Quantifying the Knowledge in a DNN to Explain Knowledge Distillation for Classification

Compared to traditional learning from scratch, knowledge distillation sometimes makes the DNN achieve superior performance. This paper provides a new perspective to explain the success of knowledge distillation, i.e., quantifying knowledge…

Machine Learning · Computer Science 2022-08-19 Quanshi Zhang , Xu Cheng , Yilan Chen , Zhefan Rao

A Survey on Recent Teacher-student Learning Studies

Knowledge distillation is a method of transferring the knowledge from a complex deep neural network (DNN) to a smaller and faster DNN, while preserving its accuracy. Recent variants of knowledge distillation include teaching assistant…

Machine Learning · Computer Science 2023-04-11 Minghong Gao

What is Lost in Knowledge Distillation?

Deep neural networks (DNNs) have improved NLP tasks significantly, but training and maintaining such networks could be costly. Model compression techniques, such as, knowledge distillation (KD), have been proposed to address the issue;…

Computation and Language · Computer Science 2023-11-08 Manas Mohanty , Tanya Roosta , Peyman Passban

Collaborative Multi-Teacher Knowledge Distillation for Learning Low Bit-width Deep Neural Networks

Knowledge distillation which learns a lightweight student model by distilling knowledge from a cumbersome teacher model is an attractive approach for learning compact deep neural networks (DNNs). Recent works further improve student network…

Computer Vision and Pattern Recognition · Computer Science 2022-10-31 Cuong Pham , Tuan Hoang , Thanh-Toan Do

Harmonizing knowledge Transfer in Neural Network with Unified Distillation

Knowledge distillation (KD), known for its ability to transfer knowledge from a cumbersome network (teacher) to a lightweight one (student) without altering the architecture, has been garnering increasing attention. Two primary categories…

Computer Vision and Pattern Recognition · Computer Science 2024-09-30 Yaomin Huang , Zaomin Yan , Chaomin Shen , Faming Fang , Guixu Zhang

Knowledge Distillation of Convolutional Neural Networks through Feature Map Transformation using Decision Trees

The interpretation of reasoning by Deep Neural Networks (DNN) is still challenging due to their perceived black-box nature. Therefore, deploying DNNs in several real-world tasks is restricted by the lack of transparency of these models. We…

Computer Vision and Pattern Recognition · Computer Science 2024-03-12 Maddimsetti Srinivas , Debdoot Sheet

A Selective Survey on Versatile Knowledge Distillation Paradigm for Neural Network Models

This paper aims to provide a selective survey about knowledge distillation(KD) framework for researchers and practitioners to take advantage of it for developing new optimized models in the deep neural network field. To this end, we give a…

Machine Learning · Computer Science 2020-12-01 Jeong-Hoe Ku , JiHun Oh , YoungYoon Lee , Gaurav Pooniwala , SangJeong Lee

Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding

This paper explores the use of knowledge distillation to improve a Multi-Task Deep Neural Network (MT-DNN) (Liu et al., 2019) for learning text representations across multiple natural language understanding tasks. Although ensemble learning…

Computation and Language · Computer Science 2019-04-23 Xiaodong Liu , Pengcheng He , Weizhu Chen , Jianfeng Gao

QUEST: Quantized embedding space for transferring knowledge

Knowledge distillation refers to the process of training a compact student network to achieve better accuracy by learning from a high capacity teacher network. Most of the existing knowledge distillation methods direct the student to follow…

Computer Vision and Pattern Recognition · Computer Science 2020-07-21 Himalaya Jain , Spyros Gidaris , Nikos Komodakis , Patrick Pérez , Matthieu Cord

Self-Knowledge Distillation in Natural Language Processing

Since deep learning became a key player in natural language processing (NLP), many deep learning models have been showing remarkable performances in a variety of NLP tasks, and in some cases, they are even outperforming humans. Such high…

Computation and Language · Computer Science 2019-08-07 Sangchul Hahn , Heeyoul Choi

What Knowledge Gets Distilled in Knowledge Distillation?

Knowledge distillation aims to transfer useful information from a teacher network to a student network, with the primary goal of improving the student's performance for the task at hand. Over the years, there has a been a deluge of novel…

Computer Vision and Pattern Recognition · Computer Science 2023-11-07 Utkarsh Ojha , Yuheng Li , Anirudh Sundara Rajan , Yingyu Liang , Yong Jae Lee

Towards a Unified View of Affinity-Based Knowledge Distillation

Knowledge transfer between artificial neural networks has become an important topic in deep learning. Among the open questions are what kind of knowledge needs to be preserved for the transfer, and how it can be effectively achieved.…

Computer Vision and Pattern Recognition · Computer Science 2022-10-03 Vladimir Li , Atsuto Maki

Graph-based Knowledge Distillation by Multi-head Attention Network

Knowledge distillation (KD) is a technique to derive optimal performance from a small student network (SN) by distilling knowledge of a large teacher network (TN) and transferring the distilled knowledge to the small SN. Since a role of…

Machine Learning · Computer Science 2019-07-10 Seunghyun Lee , Byung Cheol Song

A Comprehensive Survey on Knowledge Distillation

Deep Neural Networks (DNNs) have achieved notable performance in the fields of computer vision and natural language processing with various applications in both academia and industry. However, with recent advancements in DNNs and…

Computer Vision and Pattern Recognition · Computer Science 2025-10-14 Amir M. Mansourian , Rozhan Ahmadi , Masoud Ghafouri , Amir Mohammad Babaei , Elaheh Badali Golezani , Zeynab Yasamani Ghamchi , Vida Ramezanian , Alireza Taherian , Kimia Dinashi , Amirali Miri , Shohreh Kasaei

Knowledge Distillation: A Survey

In recent years, deep neural networks have been successful in both industry and academia, especially for computer vision tasks. The great success of deep learning is mainly due to its scalability to encode large-scale data and to maneuver…

Machine Learning · Computer Science 2021-05-21 Jianping Gou , Baosheng Yu , Stephen John Maybank , Dacheng Tao

Towards Understanding Knowledge Distillation

Knowledge distillation, i.e., one classifier being trained on the outputs of another classifier, is an empirically very successful technique for knowledge transfer between classifiers. It has even been observed that classifiers learn much…

Machine Learning · Computer Science 2021-05-28 Mary Phuong , Christoph H. Lampert

On the Demystification of Knowledge Distillation: A Residual Network Perspective

Knowledge distillation (KD) is generally considered as a technique for performing model compression and learned-label smoothing. However, in this paper, we study and investigate the KD approach from a new perspective: we study its efficacy…

Computer Vision and Pattern Recognition · Computer Science 2020-07-01 Nandan Kumar Jha , Rajat Saini , Sparsh Mittal

Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher

Knowledge distillation is a strategy of training a student network with guide of the soft output from a teacher network. It has been a successful method of model compression and knowledge transfer. However, currently knowledge distillation…

Machine Learning · Computer Science 2024-10-21 Guangda Ji , Zhanxing Zhu

Quantifying the Knowledge in GNNs for Reliable Distillation into MLPs

To bridge the gaps between topology-aware Graph Neural Networks (GNNs) and inference-efficient Multi-Layer Perceptron (MLPs), GLNN proposes to distill knowledge from a well-trained teacher GNN into a student MLP. Despite their great…

Machine Learning · Computer Science 2023-06-12 Lirong Wu , Haitao Lin , Yufei Huang , Stan Z. Li

Distribution Shift Matters for Knowledge Distillation with Webly Collected Images

Knowledge distillation aims to learn a lightweight student network from a pre-trained teacher network. In practice, existing knowledge distillation methods are usually infeasible when the original training data is unavailable due to some…

Computer Vision and Pattern Recognition · Computer Science 2023-07-24 Jialiang Tang , Shuo Chen , Gang Niu , Masashi Sugiyama , Chen Gong