English
Related papers

Related papers: Efficient Sub-structured Knowledge Distillation

200 papers

Knowledge distillation is a critical technique to transfer knowledge between models, typically from a large model (the teacher) to a more fine-grained one (the student). The objective function of knowledge distillation is typically the…

Computation and Language · Computer Science 2021-06-03 Xinyu Wang , Yong Jiang , Zhaohui Yan , Zixia Jia , Nguyen Bach , Tao Wang , Zhongqiang Huang , Fei Huang , Kewei Tu

Knowledge distillation addresses the problem of transferring knowledge from a teacher model to a student model. In this process, we typically have multiple types of knowledge extracted from the teacher model. The problem is to make full use…

Computation and Language · Computer Science 2023-02-02 Chenglong Wang , Yi Lu , Yongyu Mu , Yimin Hu , Tong Xiao , Jingbo Zhu

Knowledge distillation aims to compress a powerful yet cumbersome teacher model into a lightweight student model without much sacrifice of performance. For this purpose, various approaches have been proposed over the past few years,…

Computer Vision and Pattern Recognition · Computer Science 2022-03-29 Defang Chen , Jian-Ping Mei , Hailin Zhang , Can Wang , Yan Feng , Chun Chen

In this work, we consider transferring the structure information from large networks to compact ones for dense prediction tasks in computer vision. Previous knowledge distillation strategies used for dense prediction tasks often directly…

Computer Vision and Pattern Recognition · Computer Science 2020-06-16 Yifan Liu , Changyong Shun , Jingdong Wang , Chunhua Shen

Knowledge distillation is an effective approach to learn compact models (students) with the supervision of large and strong models (teachers). As empirically there exists a strong correlation between the performance of teacher and student…

Machine Learning · Computer Science 2022-10-13 Chaofei Wang , Qisen Yang , Rui Huang , Shiji Song , Gao Huang

Knowledge distillation aims to enhance the performance of a lightweight student model by exploiting the knowledge from a pre-trained cumbersome teacher model. However, in the traditional knowledge distillation, teacher predictions are only…

Machine Learning · Computer Science 2023-05-26 Shiya Luo , Defang Chen , Can Wang

We propose a novel knowledge distillation approach to facilitate the transfer of dark knowledge from a teacher to a student. Contrary to most of the existing methods that rely on effective training of student models given pretrained…

Machine Learning · Computer Science 2022-01-25 Dae Young Park , Moon-Hyun Cha , Changwook Jeong , Dae Sin Kim , Bohyung Han

Top-performing machine learning systems, such as deep neural networks, large ensembles and complex probabilistic graphical models, can be expensive to store, slow to evaluate and hard to integrate into larger systems. Ideally, we would like…

Machine Learning · Statistics 2015-10-09 George Papamakarios

Knowledge distillation is used, in generative language modeling, to train a smaller student model using the help of a larger teacher model, resulting in improved capabilities for the student model. In this paper, we formulate a more general…

Computation and Language · Computer Science 2025-02-26 Guanlin Liu , Anand Ramachandran , Tanmay Gangwani , Yan Fu , Abhinav Sethy

Knowledge distillation involves transferring the predictive capabilities of large, high-performing AI models (teachers) to smaller models (students) that can operate in environments with limited computing power. In this paper, we address…

Machine Learning · Computer Science 2026-01-12 Pattarawat Chormai , Ali Hashemi , Klaus-Robert Müller , Grégoire Montavon

Knowledge distillation, which involves extracting the "dark knowledge" from a teacher network to guide the learning of a student network, has emerged as an important technique for model compression and transfer learning. Unlike previous…

Computer Vision and Pattern Recognition · Computer Science 2020-07-14 Guodong Xu , Ziwei Liu , Xiaoxiao Li , Chen Change Loy

Many natural language processing tasks can be modeled into structured prediction and solved as a search problem. In this paper, we distill an ensemble of multiple models trained with different initialization into a single model. In addition…

Computation and Language · Computer Science 2018-05-30 Yijia Liu , Wanxiang Che , Huaipeng Zhao , Bing Qin , Ting Liu

In natural language processing (NLP) tasks, slow inference speed and huge footprints in GPU usage remain the bottleneck of applying pre-trained deep models in production. As a popular method for model compression, knowledge distillation…

Computation and Language · Computer Science 2020-12-15 Fei Yuan , Linjun Shou , Jian Pei , Wutao Lin , Ming Gong , Yan Fu , Daxin Jiang

Unlike existing knowledge distillation methods focus on the baseline settings, where the teacher models and training strategies are not that strong and competing as state-of-the-art approaches, this paper presents a method dubbed DIST to…

Computer Vision and Pattern Recognition · Computer Science 2022-12-29 Tao Huang , Shan You , Fei Wang , Chen Qian , Chang Xu

Pretrained language models have led to significant performance gains in many NLP tasks. However, the intensive computing resources to train such models remain an issue. Knowledge distillation alleviates this problem by learning a…

Computation and Language · Computer Science 2020-05-04 Linqing Liu , Huan Wang , Jimmy Lin , Richard Socher , Caiming Xiong

Knowledge distillation is the process of transferring the knowledge from a large model to a small model. In this process, the small model learns the generalization ability of the large model and retains the performance close to that of the…

Machine Learning · Computer Science 2021-03-26 Zhenyan Hou , Wenxuan Fan

Knowledge distillation is the procedure of transferring "knowledge" from a large model (the teacher) to a more compact one (the student), often being used in the context of model compression. When both models have the same architecture,…

Machine Learning · Computer Science 2022-06-20 Minh Pham , Minsu Cho , Ameya Joshi , Chinmay Hegde

A popular approach to model compression is to train an inexpensive student model to mimic the class probabilities of a highly accurate but cumbersome teacher model. Surprisingly, this two-step knowledge distillation process often leads to…

Machine Learning · Statistics 2021-04-21 Tri Dao , Govinda M Kamath , Vasilis Syrgkanis , Lester Mackey

Knowledge distillation (KD) is commonly deemed as an effective model compression technique in which a compact model (student) is trained under the supervision of a larger pretrained model or an ensemble of models (teacher). Various…

Machine Learning · Computer Science 2020-07-08 Fahad Sarfraz , Elahe Arani , Bahram Zonooz

Knowledge distillation has been used to transfer knowledge learned by a sophisticated model (teacher) to a simpler model (student). This technique is widely used to compress model complexity. However, in most applications the compressed…

Machine Learning · Computer Science 2020-11-24 Hadi Pouransari , Mojan Javaheripi , Vinay Sharma , Oncel Tuzel
‹ Prev 1 2 3 10 Next ›