Related papers: Distilling Model Knowledge

Knowledge Distillation: A Survey

In recent years, deep neural networks have been successful in both industry and academia, especially for computer vision tasks. The great success of deep learning is mainly due to its scalability to encode large-scale data and to maneuver…

Machine Learning · Computer Science 2021-05-21 Jianping Gou , Baosheng Yu , Stephen John Maybank , Dacheng Tao

Knowledge Distillation in Deep Learning and its Applications

Deep learning based models are relatively large, and it is hard to deploy such models on resource-limited devices such as mobile phones and embedded devices. One possible solution is knowledge distillation whereby a smaller model (student…

Machine Learning · Computer Science 2021-05-21 Abdolmaged Alkhulaifi , Fahad Alsahli , Irfan Ahmad

Distilling the Knowledge in a Neural Network

A very simple way to improve the performance of almost any machine learning algorithm is to train many different models on the same data and then to average their predictions. Unfortunately, making predictions using a whole ensemble of…

Machine Learning · Statistics 2015-03-10 Geoffrey Hinton , Oriol Vinyals , Jeff Dean

Online Ensemble Model Compression using Knowledge Distillation

This paper presents a novel knowledge distillation based model compression framework consisting of a student ensemble. It enables distillation of simultaneously learnt ensemble knowledge onto each of the compressed student models. Each…

Computer Vision and Pattern Recognition · Computer Science 2020-11-17 Devesh Walawalkar , Zhiqiang Shen , Marios Savvides

Dataset Distillation

Model distillation aims to distill the knowledge of a complex model into a simpler one. In this paper, we consider an alternative formulation called dataset distillation: we keep the model fixed and instead attempt to distill the knowledge…

Machine Learning · Computer Science 2020-02-26 Tongzhou Wang , Jun-Yan Zhu , Antonio Torralba , Alexei A. Efros

A New Training Framework for Deep Neural Network

Knowledge distillation is the process of transferring the knowledge from a large model to a small model. In this process, the small model learns the generalization ability of the large model and retains the performance close to that of the…

Machine Learning · Computer Science 2021-03-26 Zhenyan Hou , Wenxuan Fan

A Comprehensive Review of Knowledge Distillation in Computer Vision

Deep learning techniques have been demonstrated to surpass preceding cutting-edge machine learning techniques in recent years, with computer vision being one of the most prominent examples. However, deep learning models suffer from…

Computer Vision and Pattern Recognition · Computer Science 2024-07-24 Gousia Habib , Tausifa jan Saleem , Sheikh Musa Kaleem , Tufail Rouf , Brejesh Lall

MED-TEX: Transferring and Explaining Knowledge with Less Data from Pretrained Medical Imaging Models

Deep learning methods usually require a large amount of training data and lack interpretability. In this paper, we propose a novel knowledge distillation and model interpretation framework for medical image classification that jointly…

Computer Vision and Pattern Recognition · Computer Science 2022-01-13 Thanh Nguyen-Duc , He Zhao , Jianfei Cai , Dinh Phung

Knowledge Distillation via Weighted Ensemble of Teaching Assistants

Knowledge distillation in machine learning is the process of transferring knowledge from a large model called the teacher to a smaller model called the student. Knowledge distillation is one of the techniques to compress the large network…

Machine Learning · Computer Science 2022-06-27 Durga Prasad Ganta , Himel Das Gupta , Victor S. Sheng

Multi-teacher knowledge distillation as an effective method for compressing ensembles of neural networks

Deep learning has contributed greatly to many successes in artificial intelligence in recent years. Today, it is possible to train models that have thousands of layers and hundreds of billions of parameters. Large-scale deep models have…

Machine Learning · Computer Science 2023-02-15 Konrad Zuchniak

Knowledge Distillation Beyond Model Compression

Knowledge distillation (KD) is commonly deemed as an effective model compression technique in which a compact model (student) is trained under the supervision of a larger pretrained model or an ensemble of models (teacher). Various…

Machine Learning · Computer Science 2020-07-08 Fahad Sarfraz , Elahe Arani , Bahram Zonooz

Improved Knowledge Distillation for Pre-trained Language Models via Knowledge Selection

Knowledge distillation addresses the problem of transferring knowledge from a teacher model to a student model. In this process, we typically have multiple types of knowledge extracted from the teacher model. The problem is to make full use…

Computation and Language · Computer Science 2023-02-02 Chenglong Wang , Yi Lu , Yongyu Mu , Yimin Hu , Tong Xiao , Jingbo Zhu

Towards a theory of model distillation

Distillation is the task of replacing a complicated machine learning model with a simpler model that approximates the original [BCNM06,HVD15]. Despite many practical applications, basic questions about the extent to which models can be…

Machine Learning · Computer Science 2024-05-07 Enric Boix-Adsera

Be Your Own Best Competitor! Multi-Branched Adversarial Knowledge Transfer

Deep neural network architectures have attained remarkable improvements in scene understanding tasks. Utilizing an efficient model is one of the most important constraints for limited-resource devices. Recently, several compression methods…

Computer Vision and Pattern Recognition · Computer Science 2020-10-12 Mahdi Ghorbani , Fahimeh Fooladgar , Shohreh Kasaei

Knowledge Distillation with Training Wheels

Knowledge distillation is used, in generative language modeling, to train a smaller student model using the help of a larger teacher model, resulting in improved capabilities for the student model. In this paper, we formulate a more general…

Computation and Language · Computer Science 2025-02-26 Guanlin Liu , Anand Ramachandran , Tanmay Gangwani , Yan Fu , Abhinav Sethy

Efficient Sub-structured Knowledge Distillation

Structured prediction models aim at solving a type of problem where the output is a complex structure, rather than a single variable. Performing knowledge distillation for such models is not trivial due to their exponentially large output…

Machine Learning · Computer Science 2022-03-10 Wenye Lin , Yangming Li , Lemao Liu , Shuming Shi , Hai-tao Zheng

Model Compression with Multi-Task Knowledge Distillation for Web-scale Question Answering System

Deep pre-training and fine-tuning models (like BERT, OpenAI GPT) have demonstrated excellent results in question answering areas. However, due to the sheer amount of model parameters, the inference speed of these models is very slow. How to…

Computation and Language · Computer Science 2019-04-23 Ze Yang , Linjun Shou , Ming Gong , Wutao Lin , Daxin Jiang

Data-to-Model Distillation: Data-Efficient Learning Framework

Dataset distillation aims to distill the knowledge of a large-scale real dataset into small yet informative synthetic data such that a model trained on it performs as well as a model trained on the full dataset. Despite recent progress,…

Computer Vision and Pattern Recognition · Computer Science 2024-11-21 Ahmad Sajedi , Samir Khaki , Lucy Z. Liu , Ehsan Amjadian , Yuri A. Lawryshyn , Konstantinos N. Plataniotis

On the Orthogonality of Knowledge Distillation with Other Techniques: From an Ensemble Perspective

To put a state-of-the-art neural network to practical use, it is necessary to design a model that has a good trade-off between the resource consumption and performance on the test set. Many researchers and engineers are developing methods…

Machine Learning · Computer Science 2020-09-15 SeongUk Park , KiYoon Yoo , Nojun Kwak

Distilling Lightweight Domain Experts from Large ML Models by Identifying Relevant Subspaces

Knowledge distillation involves transferring the predictive capabilities of large, high-performing AI models (teachers) to smaller models (students) that can operate in environments with limited computing power. In this paper, we address…

Machine Learning · Computer Science 2026-01-12 Pattarawat Chormai , Ali Hashemi , Klaus-Robert Müller , Grégoire Montavon