English
Related papers

Related papers: Does Knowledge Distillation Really Work?

200 papers

Knowledge distillation aims to transfer useful information from a teacher network to a student network, with the primary goal of improving the student's performance for the task at hand. Over the years, there has a been a deluge of novel…

Computer Vision and Pattern Recognition · Computer Science 2023-11-07 Utkarsh Ojha , Yuheng Li , Anirudh Sundara Rajan , Yingyu Liang , Yong Jae Lee

In this paper, we present a thorough evaluation of the efficacy of knowledge distillation and its dependence on student and teacher architectures. Starting with the observation that more accurate teachers often don't make good teachers, we…

Machine Learning · Computer Science 2019-10-04 Jang Hyun Cho , Bharath Hariharan

Does Knowledge Distillation (KD) really work? Conventional wisdom viewed it as a knowledge transfer procedure where a perfect mimicry of the student to its teacher is desired. However, paradoxical studies indicate that closely replicating…

Machine Learning · Computer Science 2024-05-03 Chenqi Guo , Shiwei Zhong , Xiaofeng Liu , Qianli Feng , Yinglong Ma

Knowledge distillation is a widely applicable technique for training a student neural network under the guidance of a trained teacher network. For example, in neural network compression, a high-capacity teacher is distilled to train a…

Computer Vision and Pattern Recognition · Computer Science 2019-08-05 Frederick Tung , Greg Mori

Knowledge distillation is widely used as a means of improving the performance of a relatively simple student model using the predictions from a complex teacher model. Several works have shown that distillation significantly boosts the…

Machine Learning · Computer Science 2021-07-09 Michal Lukasik , Srinadh Bhojanapalli , Aditya Krishna Menon , Sanjiv Kumar

Knowledge distillation is the procedure of transferring "knowledge" from a large model (the teacher) to a more compact one (the student), often being used in the context of model compression. When both models have the same architecture,…

Machine Learning · Computer Science 2022-06-20 Minh Pham , Minsu Cho , Ameya Joshi , Chinmay Hegde

Knowledge distillation aims to compress a powerful yet cumbersome teacher model into a lightweight student model without much sacrifice of performance. For this purpose, various approaches have been proposed over the past few years,…

Computer Vision and Pattern Recognition · Computer Science 2022-03-29 Defang Chen , Jian-Ping Mei , Hailin Zhang , Can Wang , Yan Feng , Chun Chen

Training a small student network with the guidance of a larger teacher network is an effective way to promote the performance of the student. Despite the different types, the guided knowledge used to distill is always kept unchanged for…

Computer Vision and Pattern Recognition · Computer Science 2021-04-01 Jiangfan Han , Mengya Gao , Yujie Wang , Quanquan Li , Hongsheng Li , Xiaogang Wang

Knowledge distillation addresses the problem of transferring knowledge from a teacher model to a student model. In this process, we typically have multiple types of knowledge extracted from the teacher model. The problem is to make full use…

Computation and Language · Computer Science 2023-02-02 Chenglong Wang , Yi Lu , Yongyu Mu , Yimin Hu , Tong Xiao , Jingbo Zhu

Recent years have witnessed dramatically improvements in the knowledge distillation, which can generate a compact student model for better efficiency while retaining the model effectiveness of the teacher model. Previous studies find that:…

Computer Vision and Pattern Recognition · Computer Science 2021-11-04 Lehan Yang , Jincen Song

Knowledge distillation is a common technique for improving the performance of a shallow student network by transferring information from a teacher network, which in general, is comparatively large and deep. These teacher networks are…

Computer Vision and Pattern Recognition · Computer Science 2023-02-23 Ishan Mishra , Sethu Vamsi Krishna , Deepak Mishra

Knowledge distillation is a technique for improving the performance of a simple "student" model by replacing its one-hot training labels with a distribution over labels obtained from a complex "teacher" model. While this simple approach has…

Machine Learning · Computer Science 2020-05-22 Aditya Krishna Menon , Ankit Singh Rawat , Sashank J. Reddi , Seungyeon Kim , Sanjiv Kumar

Knowledge distillation in machine learning is the process of transferring knowledge from a large model called the teacher to a smaller model called the student. Knowledge distillation is one of the techniques to compress the large network…

Machine Learning · Computer Science 2022-06-27 Durga Prasad Ganta , Himel Das Gupta , Victor S. Sheng

Knowledge distillation is used, in generative language modeling, to train a smaller student model using the help of a larger teacher model, resulting in improved capabilities for the student model. In this paper, we formulate a more general…

Computation and Language · Computer Science 2025-02-26 Guanlin Liu , Anand Ramachandran , Tanmay Gangwani , Yan Fu , Abhinav Sethy

Knowledge distillation is an effective approach to learn compact models (students) with the supervision of large and strong models (teachers). As empirically there exists a strong correlation between the performance of teacher and student…

Machine Learning · Computer Science 2022-10-13 Chaofei Wang , Qisen Yang , Rui Huang , Shiji Song , Gao Huang

Knowledge distillation is a method of transferring the knowledge from a complex deep neural network (DNN) to a smaller and faster DNN, while preserving its accuracy. Recent variants of knowledge distillation include teaching assistant…

Machine Learning · Computer Science 2023-04-11 Minghong Gao

Knowledge distillation (KD) has been widely used to improve the test accuracy of a "student" network, by training it to mimic the soft probabilities of a trained "teacher" network. Yet, it has been shown in recent work that, despite being…

Machine Learning · Computer Science 2024-03-20 Vaishnavh Nagarajan , Aditya Krishna Menon , Srinadh Bhojanapalli , Hossein Mobahi , Sanjiv Kumar

Knowledge Distillation (KD) is a model-agnostic technique to improve model quality while having a fixed capacity budget. It is a commonly used technique for model compression, where a larger capacity teacher model with better quality is…

Machine Learning · Computer Science 2021-03-02 Jiaxi Tang , Rakesh Shivanna , Zhe Zhao , Dong Lin , Anima Singh , Ed H. Chi , Sagar Jain

After a large "teacher" neural network has been trained on labeled data, the probabilities that the teacher assigns to incorrect classes reveal a lot of information about the way in which the teacher generalizes. By training a small…

Machine Learning · Computer Science 2020-06-12 Rafael Müller , Simon Kornblith , Geoffrey Hinton

Knowledge distillation is a simple but powerful way to transfer knowledge between a teacher model to a student model. Existing work suffers from at least one of the following key limitations in terms of direction and scope of transfer which…

Machine Learning · Computer Science 2024-02-12 Michael Livanos , Ian Davidson , Stephen Wong
‹ Prev 1 2 3 10 Next ›