Related papers: Knowledge distillation via adaptive instance norma…

AdaKD: Dynamic Knowledge Distillation of ASR models using Adaptive Loss Weighting

Knowledge distillation, a widely used model compression technique, works on the basis of transferring knowledge from a cumbersome teacher model to a lightweight student model. The technique involves jointly optimizing the task specific and…

Machine Learning · Computer Science 2024-05-15 Shreyan Ganguly , Roshan Nayak , Rakshith Rao , Ujan Deb , Prathosh AP

Knowledge Distillation with the Reused Teacher Classifier

Knowledge distillation aims to compress a powerful yet cumbersome teacher model into a lightweight student model without much sacrifice of performance. For this purpose, various approaches have been proposed over the past few years,…

Computer Vision and Pattern Recognition · Computer Science 2022-03-29 Defang Chen , Jian-Ping Mei , Hailin Zhang , Can Wang , Yan Feng , Chun Chen

Model Distillation with Knowledge Transfer from Face Classification to Alignment and Verification

Knowledge distillation is a potential solution for model compression. The idea is to make a small student network imitate the target of a large teacher network, then the student network can be competitive to the teacher one. Most previous…

Computer Vision and Pattern Recognition · Computer Science 2017-10-24 Chong Wang , Xipeng Lan , Yangang Zhang

Knowledge distillation is a widely applicable technique for training a student neural network under the guidance of a trained teacher network. For example, in neural network compression, a high-capacity teacher is distilled to train a…

Computer Vision and Pattern Recognition · Computer Science 2019-08-05 Frederick Tung , Greg Mori

Distilling Object Detectors with Task Adaptive Regularization

Current state-of-the-art object detectors are at the expense of high computational costs and are hard to deploy to low-end devices. Knowledge distillation, which aims at training a smaller student network by transferring knowledge from a…

Computer Vision and Pattern Recognition · Computer Science 2020-06-24 Ruoyu Sun , Fuhui Tang , Xiaopeng Zhang , Hongkai Xiong , Qi Tian

Contrastive Representation Distillation

Often we wish to transfer representational knowledge from one neural network to another. Examples include distilling a large network into a smaller one, transferring knowledge from one sensory modality to a second, or ensembling a…

Machine Learning · Computer Science 2022-01-26 Yonglong Tian , Dilip Krishnan , Phillip Isola

AD-KD: Attribution-Driven Knowledge Distillation for Language Model Compression

Knowledge distillation has attracted a great deal of interest recently to compress pre-trained language models. However, existing knowledge distillation methods suffer from two limitations. First, the student model simply imitates the…

Computation and Language · Computer Science 2023-05-18 Siyue Wu , Hongzhan Chen , Xiaojun Quan , Qifan Wang , Rui Wang

Knowledge Distillation as Semiparametric Inference

A popular approach to model compression is to train an inexpensive student model to mimic the class probabilities of a highly accurate but cumbersome teacher model. Surprisingly, this two-step knowledge distillation process often leads to…

Machine Learning · Statistics 2021-04-21 Tri Dao , Govinda M Kamath , Vasilis Syrgkanis , Lester Mackey

Improved Knowledge Distillation for Pre-trained Language Models via Knowledge Selection

Knowledge distillation addresses the problem of transferring knowledge from a teacher model to a student model. In this process, we typically have multiple types of knowledge extracted from the teacher model. The problem is to make full use…

Computation and Language · Computer Science 2023-02-02 Chenglong Wang , Yi Lu , Yongyu Mu , Yimin Hu , Tong Xiao , Jingbo Zhu

Channel Distillation: Channel-Wise Attention for Knowledge Distillation

Knowledge distillation is to transfer the knowledge from the data learned by the teacher network to the student network, so that the student has the advantage of less parameters and less calculations, and the accuracy is close to the…

Machine Learning · Computer Science 2020-06-03 Zaida Zhou , Chaoran Zhuge , Xinwei Guan , Wen Liu

A Simple and Generic Framework for Feature Distillation via Channel-wise Transformation

Knowledge distillation is a popular technique for transferring the knowledge from a large teacher model to a smaller student model by mimicking. However, distillation by directly aligning the feature maps between teacher and student may…

Computer Vision and Pattern Recognition · Computer Science 2023-03-27 Ziwei Liu , Yongtao Wang , Xiaojie Chu

Learning Student-Friendly Teacher Networks for Knowledge Distillation

We propose a novel knowledge distillation approach to facilitate the transfer of dark knowledge from a teacher to a student. Contrary to most of the existing methods that rely on effective training of student models given pretrained…

Machine Learning · Computer Science 2022-01-25 Dae Young Park , Moon-Hyun Cha , Changwook Jeong , Dae Sin Kim , Bohyung Han

Extracurricular Learning: Knowledge Transfer Beyond Empirical Distribution

Knowledge distillation has been used to transfer knowledge learned by a sophisticated model (teacher) to a simpler model (student). This technique is widely used to compress model complexity. However, in most applications the compressed…

Machine Learning · Computer Science 2020-11-24 Hadi Pouransari , Mojan Javaheripi , Vinay Sharma , Oncel Tuzel

Student Network Learning via Evolutionary Knowledge Distillation

Knowledge distillation provides an effective way to transfer knowledge via teacher-student learning, where most existing distillation approaches apply a fixed pre-trained model as teacher to supervise the learning of student network. This…

Machine Learning · Computer Science 2021-03-26 Kangkai Zhang , Chunhui Zhang , Shikun Li , Dan Zeng , Shiming Ge

Relational Representation Distillation

Knowledge distillation involves transferring knowledge from large, cumbersome teacher models to more compact student models. The standard approach minimizes the Kullback-Leibler (KL) divergence between the probabilistic outputs of a teacher…

Computer Vision and Pattern Recognition · Computer Science 2025-05-14 Nikolaos Giakoumoglou , Tania Stathaki

Class-Incremental Learning by Knowledge Distillation with Adaptive Feature Consolidation

We present a novel class incremental learning approach based on deep neural networks, which continually learns new tasks with limited memory for storing examples in the previous tasks. Our algorithm is based on knowledge distillation and…

Machine Learning · Computer Science 2022-04-05 Minsoo Kang , Jaeyoo Park , Bohyung Han

Distribution Shift Matters for Knowledge Distillation with Webly Collected Images

Knowledge distillation aims to learn a lightweight student network from a pre-trained teacher network. In practice, existing knowledge distillation methods are usually infeasible when the original training data is unavailable due to some…

Computer Vision and Pattern Recognition · Computer Science 2023-07-24 Jialiang Tang , Shuo Chen , Gang Niu , Masashi Sugiyama , Chen Gong

A Survey on Recent Teacher-student Learning Studies

Knowledge distillation is a method of transferring the knowledge from a complex deep neural network (DNN) to a smaller and faster DNN, while preserving its accuracy. Recent variants of knowledge distillation include teaching assistant…

Machine Learning · Computer Science 2023-04-11 Minghong Gao

Show, Attend and Distill:Knowledge Distillation via Attention-based Feature Matching

Knowledge distillation extracts general knowledge from a pre-trained teacher network and provides guidance to a target student network. Most studies manually tie intermediate features of the teacher and student, and transfer knowledge…

Machine Learning · Computer Science 2021-02-08 Mingi Ji , Byeongho Heo , Sungrae Park

QUEST: Quantized embedding space for transferring knowledge

Knowledge distillation refers to the process of training a compact student network to achieve better accuracy by learning from a high capacity teacher network. Most of the existing knowledge distillation methods direct the student to follow…

Computer Vision and Pattern Recognition · Computer Science 2020-07-21 Himalaya Jain , Spyros Gidaris , Nikos Komodakis , Patrick Pérez , Matthieu Cord