Related papers: Online Ensemble Model Compression using Knowledge …

Ensemble Knowledge Distillation for Learning Improved and Efficient Networks

Ensemble models comprising of deep Convolutional Neural Networks (CNN) have shown significant improvements in model generalization but at the cost of large computation and memory requirements. In this paper, we present a framework for…

Computer Vision and Pattern Recognition · Computer Science 2020-04-03 Umar Asif , Jianbin Tang , Stefan Harrer

Unified and Effective Ensemble Knowledge Distillation

Ensemble knowledge distillation can extract knowledge from multiple teacher models and encode it into a single student model. Many existing methods learn and distill the student model on labeled data only. However, the teacher models are…

Machine Learning · Computer Science 2022-04-04 Chuhan Wu , Fangzhao Wu , Tao Qi , Yongfeng Huang

Multi-teacher knowledge distillation as an effective method for compressing ensembles of neural networks

Deep learning has contributed greatly to many successes in artificial intelligence in recent years. Today, it is possible to train models that have thousands of layers and hundreds of billions of parameters. Large-scale deep models have…

Machine Learning · Computer Science 2023-02-15 Konrad Zuchniak

Distilling Model Knowledge

Top-performing machine learning systems, such as deep neural networks, large ensembles and complex probabilistic graphical models, can be expensive to store, slow to evaluate and hard to integrate into larger systems. Ideally, we would like…

Machine Learning · Statistics 2015-10-09 George Papamakarios

Deep Collective Knowledge Distillation

Many existing studies on knowledge distillation have focused on methods in which a student model mimics a teacher model well. Simply imitating the teacher's knowledge, however, is not sufficient for the student to surpass that of the…

Computer Vision and Pattern Recognition · Computer Science 2023-04-19 Jihyeon Seo , Kyusam Oh , Chanho Min , Yongkeun Yun , Sungwoo Cho

Online Knowledge Distillation via Multi-branch Diversity Enhancement

Knowledge distillation is an effective method to transfer the knowledge from the cumbersome teacher model to the lightweight student model. Online knowledge distillation uses the ensembled prediction results of multiple student models as…

Computer Vision and Pattern Recognition · Computer Science 2020-11-16 Zheng Li , Ying Huang , Defang Chen , Tianren Luo , Ning Cai , Zhigeng Pan

Knowledge Distillation via Weighted Ensemble of Teaching Assistants

Knowledge distillation in machine learning is the process of transferring knowledge from a large model called the teacher to a smaller model called the student. Knowledge distillation is one of the techniques to compress the large network…

Machine Learning · Computer Science 2022-06-27 Durga Prasad Ganta , Himel Das Gupta , Victor S. Sheng

Meta-Ensemble Parameter Learning

Ensemble of machine learning models yields improved performance as well as robustness. However, their memory requirements and inference costs can be prohibitively high. Knowledge distillation is an approach that allows a single model to…

Computer Vision and Pattern Recognition · Computer Science 2022-10-06 Zhengcong Fei , Shuman Tian , Junshi Huang , Xiaoming Wei , Xiaolin Wei

Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning

We formally study how ensemble of deep learning models can improve test accuracy, and how the superior performance of ensemble can be distilled into a single model using knowledge distillation. We consider the challenging case where the…

Machine Learning · Computer Science 2023-02-16 Zeyuan Allen-Zhu , Yuanzhi Li

Few Sample Knowledge Distillation for Efficient Network Compression

Deep neural network compression techniques such as pruning and weight tensor decomposition usually require fine-tuning to recover the prediction accuracy when the compression ratio is high. However, conventional fine-tuning suffers from the…

Machine Learning · Computer Science 2020-04-01 Tianhong Li , Jianguo Li , Zhuang Liu , Changshui Zhang

Ensemble Distillation for Neural Machine Translation

Knowledge distillation describes a method for training a student network to perform better by learning from a stronger teacher network. Translating a sentence with an Neural Machine Translation (NMT) engine is time expensive and having a…

Computation and Language · Computer Science 2017-08-09 Markus Freitag , Yaser Al-Onaizan , Baskaran Sankaran

Knowledge Distillation with the Reused Teacher Classifier

Knowledge distillation aims to compress a powerful yet cumbersome teacher model into a lightweight student model without much sacrifice of performance. For this purpose, various approaches have been proposed over the past few years,…

Computer Vision and Pattern Recognition · Computer Science 2022-03-29 Defang Chen , Jian-Ping Mei , Hailin Zhang , Can Wang , Yan Feng , Chun Chen

Uncertainty-Aware Dual-Student Knowledge Distillation for Efficient Image Classification

Knowledge distillation has emerged as a powerful technique for model compression, enabling the transfer of knowledge from large teacher networks to compact student models. However, traditional knowledge distillation methods treat all…

Computer Vision and Pattern Recognition · Computer Science 2025-11-25 Aakash Gore , Anoushka Dey , Aryan Mishra

Adaptive Group Robust Ensemble Knowledge Distillation

Neural networks can learn spurious correlations in the data, often leading to performance degradation for underrepresented subgroups. Studies have demonstrated that the disparity is amplified when knowledge is distilled from a complex…

Machine Learning · Computer Science 2025-11-11 Patrik Kenfack , Ulrich Aïvodji , Samira Ebrahimi Kahou

Towards a Smaller Student: Capacity Dynamic Distillation for Efficient Image Retrieval

Previous Knowledge Distillation based efficient image retrieval methods employs a lightweight network as the student model for fast inference. However, the lightweight student model lacks adequate representation capacity for effective…

Computer Vision and Pattern Recognition · Computer Science 2023-10-06 Yi Xie , Huaidong Zhang , Xuemiao Xu , Jianqing Zhu , Shengfeng He

Model compression using knowledge distillation with integrated gradients

Model compression is critical for deploying deep learning models on resource-constrained devices. We introduce a novel method enhancing knowledge distillation with integrated gradients (IG) as a data augmentation strategy. Our approach…

Computer Vision and Pattern Recognition · Computer Science 2025-06-18 David E. Hernandez , Jose Chang , Torbjörn E. M. Nordling

Distilling the Knowledge in a Neural Network

A very simple way to improve the performance of almost any machine learning algorithm is to train many different models on the same data and then to average their predictions. Unfortunately, making predictions using a whole ensemble of…

Machine Learning · Statistics 2015-03-10 Geoffrey Hinton , Oriol Vinyals , Jeff Dean

Improving Ensemble Distillation With Weight Averaging and Diversifying Perturbation

Ensembles of deep neural networks have demonstrated superior performance, but their heavy computational cost hinders applying them for resource-limited environments. It motivates distilling knowledge from the ensemble teacher into a smaller…

Machine Learning · Computer Science 2022-07-01 Giung Nam , Hyungi Lee , Byeongho Heo , Juho Lee

A general framework for ensemble distribution distillation

Ensembles of neural networks have been shown to give better performance than single networks, both in terms of predictions and uncertainty estimation. Additionally, ensembles allow the uncertainty to be decomposed into aleatoric (data) and…

Machine Learning · Statistics 2021-01-11 Jakob Lindqvist , Amanda Olmin , Fredrik Lindsten , Lennart Svensson

Knowledge Distillation by On-the-Fly Native Ensemble

Knowledge distillation is effective to train small and generalisable network models for meeting the low-memory and fast running requirements. Existing offline distillation methods rely on a strong pre-trained teacher, which enables…

Computer Vision and Pattern Recognition · Computer Science 2018-09-11 Xu Lan , Xiatian Zhu , Shaogang Gong