Related papers: Self-supervised Knowledge Distillation Using Singu…

Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks

Deep neural models in recent years have been successful in almost every field, including extremely complex problem statements. However, these models are huge in size, with millions (and even billions) of parameters, thus demanding more…

Computer Vision and Pattern Recognition · Computer Science 2021-06-18 Lin Wang , Kuk-Jin Yoon

A Survey on Recent Teacher-student Learning Studies

Knowledge distillation is a method of transferring the knowledge from a complex deep neural network (DNN) to a smaller and faster DNN, while preserving its accuracy. Recent variants of knowledge distillation include teaching assistant…

Machine Learning · Computer Science 2023-04-11 Minghong Gao

Self-Knowledge Distillation via Dropout

To boost the performance, deep neural networks require deeper or wider network structures that involve massive computational and memory costs. To alleviate this issue, the self-knowledge distillation method regularizes the model by…

Computer Vision and Pattern Recognition · Computer Science 2022-08-12 Hyoje Lee , Yeachan Park , Hyun Seo , Myungjoo Kang

Attention Distillation: self-supervised vision transformer students need more guidance

Self-supervised learning has been widely applied to train high-quality vision transformers. Unleashing their excellent performance on memory and compute constraint devices is therefore an important research topic. However, how to distill…

Computer Vision and Pattern Recognition · Computer Science 2022-10-04 Kai Wang , Fei Yang , Joost van de Weijer

Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding

This paper explores the use of knowledge distillation to improve a Multi-Task Deep Neural Network (MT-DNN) (Liu et al., 2019) for learning text representations across multiple natural language understanding tasks. Although ensemble learning…

Computation and Language · Computer Science 2019-04-23 Xiaodong Liu , Pengcheng He , Weizhu Chen , Jianfeng Gao

Self-Distillation Learning Based on Temporal-Spatial Consistency for Spiking Neural Networks

Spiking neural networks (SNNs) have attracted considerable attention for their event-driven, low-power characteristics and high biological interpretability. Inspired by knowledge distillation (KD), recent research has improved the…

Machine Learning · Computer Science 2024-06-13 Lin Zuo , Yongqi Ding , Mengmeng Jing , Kunshan Yang , Yunqian Yu

Self-Referenced Deep Learning

Knowledge distillation is an effective approach to transferring knowledge from a teacher neural network to a student target network for satisfying the low-memory and fast running requirements in practice use. Whilst being able to create…

Computer Vision and Pattern Recognition · Computer Science 2018-11-20 Xu Lan , Xiatian Zhu , Shaogang Gong

An Embarrassingly Simple Approach for Knowledge Distillation

Knowledge Distillation (KD) aims at improving the performance of a low-capacity student model by inheriting knowledge from a high-capacity teacher model. Previous KD methods typically train a student by minimizing a task-related loss and…

Computer Vision and Pattern Recognition · Computer Science 2019-09-10 Mengya Gao , Yujun Shen , Quanquan Li , Junjie Yan , Liang Wan , Dahua Lin , Chen Change Loy , Xiaoou Tang

Knowledge Distillation with Deep Supervision

Knowledge distillation aims to enhance the performance of a lightweight student model by exploiting the knowledge from a pre-trained cumbersome teacher model. However, in the traditional knowledge distillation, teacher predictions are only…

Machine Learning · Computer Science 2023-05-26 Shiya Luo , Defang Chen , Can Wang

A New Training Framework for Deep Neural Network

Knowledge distillation is the process of transferring the knowledge from a large model to a small model. In this process, the small model learns the generalization ability of the large model and retains the performance close to that of the…

Machine Learning · Computer Science 2021-03-26 Zhenyan Hou , Wenxuan Fan

Explaining Knowledge Distillation by Quantifying the Knowledge

This paper presents a method to interpret the success of knowledge distillation by quantifying and analyzing task-relevant and task-irrelevant visual concepts that are encoded in intermediate layers of a deep neural network (DNN). More…

Machine Learning · Computer Science 2020-03-26 Xu Cheng , Zhefan Rao , Yilan Chen , Quanshi Zhang

Data Upcycling Knowledge Distillation for Image Super-Resolution

Knowledge distillation (KD) compresses deep neural networks by transferring task-related knowledge from cumbersome pre-trained teacher models to compact student models. However, current KD methods for super-resolution (SR) networks overlook…

Computer Vision and Pattern Recognition · Computer Science 2024-04-30 Yun Zhang , Wei Li , Simiao Li , Hanting Chen , Zhijun Tu , Wenjia Wang , Bingyi Jing , Shaohui Lin , Jie Hu

Embedded Knowledge Distillation in Depth-Level Dynamic Neural Network

In real applications, different computation-resource devices need different-depth networks (e.g., ResNet-18/34/50) with high-accuracy. Usually, existing methods either design multiple networks and train them independently, or construct…

Computer Vision and Pattern Recognition · Computer Science 2021-08-11 Qi Zhao , Shuchang Lyu , Zhiwei Zhang , Ting-Bing Xu , Guangliang Cheng

Decouple Non-parametric Knowledge Distillation For End-to-end Speech Translation

Existing techniques often attempt to make knowledge transfer from a powerful machine translation (MT) to speech translation (ST) model with some elaborate techniques, which often requires transcription as extra input during training.…

Computation and Language · Computer Science 2023-04-21 Hao Zhang , Nianwen Si , Yaqi Chen , Wenlin Zhang , Xukui Yang , Dan Qu , Zhen Li

Learning Knowledge Representation with Meta Knowledge Distillation for Single Image Super-Resolution

Knowledge distillation (KD), which can efficiently transfer knowledge from a cumbersome network (teacher) to a compact network (student), has demonstrated its advantages in some computer vision applications. The representation of knowledge…

Computer Vision and Pattern Recognition · Computer Science 2022-07-19 Han Zhu , Zhenzhong Chen , Shan Liu

What is Lost in Knowledge Distillation?

Deep neural networks (DNNs) have improved NLP tasks significantly, but training and maintaining such networks could be costly. Model compression techniques, such as, knowledge distillation (KD), have been proposed to address the issue;…

Computation and Language · Computer Science 2023-11-08 Manas Mohanty , Tanya Roosta , Peyman Passban

Robust Knowledge Distillation from RNN-T Models With Noisy Training Labels Using Full-Sum Loss

This work studies knowledge distillation (KD) and addresses its constraints for recurrent neural network transducer (RNN-T) models. In hard distillation, a teacher model transcribes large amounts of unlabelled speech to train a student…

Computation and Language · Computer Science 2023-03-13 Mohammad Zeineldeen , Kartik Audhkhasi , Murali Karthick Baskar , Bhuvana Ramabhadran

Self-Knowledge Distillation with Progressive Refinement of Targets

The generalization capability of deep neural networks has been substantially improved by applying a wide spectrum of regularization methods, e.g., restricting function space, injecting randomness during training, augmenting data, etc. In…

Machine Learning · Computer Science 2021-10-08 Kyungyul Kim , ByeongMoon Ji , Doyoung Yoon , Sangheum Hwang

A Novel Self-Knowledge Distillation Approach with Siamese Representation Learning for Action Recognition

Knowledge distillation is an effective transfer of knowledge from a heavy network (teacher) to a small network (student) to boost students' performance. Self-knowledge distillation, the special case of knowledge distillation, has been…

Computer Vision and Pattern Recognition · Computer Science 2022-09-07 Duc-Quang Vu , Trang Phung , Jia-Ching Wang

Distilling Spikes: Knowledge Distillation in Spiking Neural Networks

Spiking Neural Networks (SNN) are energy-efficient computing architectures that exchange spikes for processing information, unlike classical Artificial Neural Networks (ANN). Due to this, SNNs are better suited for real-life deployments.…

Neural and Evolutionary Computing · Computer Science 2020-05-04 Ravi Kumar Kushawaha , Saurabh Kumar , Biplab Banerjee , Rajbabu Velmurugan