Related papers: Recurrent knowledge distillation

Densely Guided Knowledge Distillation using Multiple Teacher Assistants

With the success of deep neural networks, knowledge distillation which guides the learning of a small student network from a large teacher network is being actively studied for model compression and transfer learning. However, few studies…

Computer Vision and Pattern Recognition · Computer Science 2021-08-10 Wonchul Son , Jaemin Na , Junyong Choi , Wonjun Hwang

Improved Knowledge Distillation via Teacher Assistant

Despite the fact that deep neural networks are powerful models and achieve appealing results on many tasks, they are too large to be deployed on edge devices like smartphones or embedded sensor nodes. There have been efforts to compress…

Machine Learning · Computer Science 2019-12-18 Seyed-Iman Mirzadeh , Mehrdad Farajtabar , Ang Li , Nir Levine , Akihiro Matsukawa , Hassan Ghasemzadeh

FitNets: Hints for Thin Deep Nets

While depth tends to improve network performances, it also makes gradient-based training more difficult since deeper networks tend to be more non-linear. The recently proposed knowledge distillation approach is aimed at obtaining small and…

Machine Learning · Computer Science 2015-03-30 Adriana Romero , Nicolas Ballas , Samira Ebrahimi Kahou , Antoine Chassang , Carlo Gatta , Yoshua Bengio

Knowledge Distillation via Instance-level Sequence Learning

Recently, distillation approaches are suggested to extract general knowledge from a teacher network to guide a student network. Most of the existing methods transfer knowledge from the teacher network to the student via feeding the sequence…

Computer Vision and Pattern Recognition · Computer Science 2021-06-22 Haoran Zhao , Xin Sun , Junyu Dong , Zihe Dong , Qiong Li

ResKD: Residual-Guided Knowledge Distillation

Knowledge distillation, aimed at transferring the knowledge from a heavy teacher network to a lightweight student network, has emerged as a promising technique for compressing neural networks. However, due to the capacity gap between the…

Computer Vision and Pattern Recognition · Computer Science 2021-12-01 Xuewei Li , Songyuan Li , Bourahla Omar , Fei Wu , Xi Li

Distilling Knowledge via Knowledge Review

Knowledge distillation transfers knowledge from the teacher network to the student one, with the goal of greatly improving the performance of the student network. Previous methods mostly focus on proposing feature transformation and loss…

Computer Vision and Pattern Recognition · Computer Science 2021-04-20 Pengguang Chen , Shu Liu , Hengshuang Zhao , Jiaya Jia

Collaborative Multi-Teacher Knowledge Distillation for Learning Low Bit-width Deep Neural Networks

Knowledge distillation which learns a lightweight student model by distilling knowledge from a cumbersome teacher model is an attractive approach for learning compact deep neural networks (DNNs). Recent works further improve student network…

Computer Vision and Pattern Recognition · Computer Science 2022-10-31 Cuong Pham , Tuan Hoang , Thanh-Toan Do

Few Sample Knowledge Distillation for Efficient Network Compression

Deep neural network compression techniques such as pruning and weight tensor decomposition usually require fine-tuning to recover the prediction accuracy when the compression ratio is high. However, conventional fine-tuning suffers from the…

Machine Learning · Computer Science 2020-04-01 Tianhong Li , Jianguo Li , Zhuang Liu , Changshui Zhang

Fixing the Teacher-Student Knowledge Discrepancy in Distillation

Training a small student network with the guidance of a larger teacher network is an effective way to promote the performance of the student. Despite the different types, the guided knowledge used to distill is always kept unchanged for…

Computer Vision and Pattern Recognition · Computer Science 2021-04-01 Jiangfan Han , Mengya Gao , Yujie Wang , Quanquan Li , Hongsheng Li , Xiaogang Wang

Student Network Learning via Evolutionary Knowledge Distillation

Knowledge distillation provides an effective way to transfer knowledge via teacher-student learning, where most existing distillation approaches apply a fixed pre-trained model as teacher to supervise the learning of student network. This…

Machine Learning · Computer Science 2021-03-26 Kangkai Zhang , Chunhui Zhang , Shikun Li , Dan Zeng , Shiming Ge

Distillation from heterogeneous unlabeled collections

Compressing deep networks is essential to expand their range of applications to constrained settings. The need for compression however often arises long after the model was trained, when the original data might no longer be available. On…

Machine Learning · Computer Science 2022-01-19 Jean-Michel Begon , Pierre Geurts

Knowledge Distillation with the Reused Teacher Classifier

Knowledge distillation aims to compress a powerful yet cumbersome teacher model into a lightweight student model without much sacrifice of performance. For this purpose, various approaches have been proposed over the past few years,…

Computer Vision and Pattern Recognition · Computer Science 2022-03-29 Defang Chen , Jian-Ping Mei , Hailin Zhang , Can Wang , Yan Feng , Chun Chen

Distilling Calibrated Student from an Uncalibrated Teacher

Knowledge distillation is a common technique for improving the performance of a shallow student network by transferring information from a teacher network, which in general, is comparatively large and deep. These teacher networks are…

Computer Vision and Pattern Recognition · Computer Science 2023-02-23 Ishan Mishra , Sethu Vamsi Krishna , Deepak Mishra

Knowledge distillation is a widely applicable technique for training a student neural network under the guidance of a trained teacher network. For example, in neural network compression, a high-capacity teacher is distilled to train a…

Computer Vision and Pattern Recognition · Computer Science 2019-08-05 Frederick Tung , Greg Mori

Activation Map Adaptation for Effective Knowledge Distillation

Model compression becomes a recent trend due to the requirement of deploying neural networks on embedded and mobile devices. Hence, both accuracy and efficiency are of critical importance. To explore a balance between them, a knowledge…

Computer Vision and Pattern Recognition · Computer Science 2022-04-15 Zhiyuan Wu , Hong Qi , Yu Jiang , Minghao Zhao , Chupeng Cui , Zongmin Yang , Xinhui Xue

Channel Distillation: Channel-Wise Attention for Knowledge Distillation

Knowledge distillation is to transfer the knowledge from the data learned by the teacher network to the student network, so that the student has the advantage of less parameters and less calculations, and the accuracy is close to the…

Machine Learning · Computer Science 2020-06-03 Zaida Zhou , Chaoran Zhuge , Xinwei Guan , Wen Liu

On the Demystification of Knowledge Distillation: A Residual Network Perspective

Knowledge distillation (KD) is generally considered as a technique for performing model compression and learned-label smoothing. However, in this paper, we study and investigate the KD approach from a new perspective: we study its efficacy…

Computer Vision and Pattern Recognition · Computer Science 2020-07-01 Nandan Kumar Jha , Rajat Saini , Sparsh Mittal

Distilling Lightweight Domain Experts from Large ML Models by Identifying Relevant Subspaces

Knowledge distillation involves transferring the predictive capabilities of large, high-performing AI models (teachers) to smaller models (students) that can operate in environments with limited computing power. In this paper, we address…

Machine Learning · Computer Science 2026-01-12 Pattarawat Chormai , Ali Hashemi , Klaus-Robert Müller , Grégoire Montavon

A Survey on Recent Teacher-student Learning Studies

Knowledge distillation is a method of transferring the knowledge from a complex deep neural network (DNN) to a smaller and faster DNN, while preserving its accuracy. Recent variants of knowledge distillation include teaching assistant…

Machine Learning · Computer Science 2023-04-11 Minghong Gao

Training convolutional neural networks with cheap convolutions and online distillation

The large memory and computation consumption in convolutional neural networks (CNNs) has been one of the main barriers for deploying them on resource-limited systems. To this end, most cheap convolutions (e.g., group convolution, depth-wise…

Computer Vision and Pattern Recognition · Computer Science 2019-10-11 Jiao Xie , Shaohui Lin , Yichen Zhang , Linkai Luo