Related papers: Meta-Ensemble Parameter Learning

Online Ensemble Model Compression using Knowledge Distillation

This paper presents a novel knowledge distillation based model compression framework consisting of a student ensemble. It enables distillation of simultaneously learnt ensemble knowledge onto each of the compressed student models. Each…

Computer Vision and Pattern Recognition · Computer Science 2020-11-17 Devesh Walawalkar , Zhiqiang Shen , Marios Savvides

Weight Distillation: Transferring the Knowledge in Neural Network Parameters

Knowledge distillation has been proven to be effective in model acceleration and compression. It allows a small network to learn to generalize in the same way as a large network. Recent successes in pre-training suggest the effectiveness of…

Computation and Language · Computer Science 2021-07-20 Ye Lin , Yanyang Li , Ziyang Wang , Bei Li , Quan Du , Tong Xiao , Jingbo Zhu

Knowledge Distillation via Weighted Ensemble of Teaching Assistants

Knowledge distillation in machine learning is the process of transferring knowledge from a large model called the teacher to a smaller model called the student. Knowledge distillation is one of the techniques to compress the large network…

Machine Learning · Computer Science 2022-06-27 Durga Prasad Ganta , Himel Das Gupta , Victor S. Sheng

Improving Ensemble Distillation With Weight Averaging and Diversifying Perturbation

Ensembles of deep neural networks have demonstrated superior performance, but their heavy computational cost hinders applying them for resource-limited environments. It motivates distilling knowledge from the ensemble teacher into a smaller…

Machine Learning · Computer Science 2022-07-01 Giung Nam , Hyungi Lee , Byeongho Heo , Juho Lee

Ensemble Knowledge Distillation for Learning Improved and Efficient Networks

Ensemble models comprising of deep Convolutional Neural Networks (CNN) have shown significant improvements in model generalization but at the cost of large computation and memory requirements. In this paper, we present a framework for…

Computer Vision and Pattern Recognition · Computer Science 2020-04-03 Umar Asif , Jianbin Tang , Stefan Harrer

Multi-teacher knowledge distillation as an effective method for compressing ensembles of neural networks

Deep learning has contributed greatly to many successes in artificial intelligence in recent years. Today, it is possible to train models that have thousands of layers and hundreds of billions of parameters. Large-scale deep models have…

Machine Learning · Computer Science 2023-02-15 Konrad Zuchniak

MEAL: Multi-Model Ensemble via Adversarial Learning

Often the best performing deep neural models are ensembles of multiple base-level networks. Unfortunately, the space required to store these many networks, and the time required to execute them at test-time, prohibits their use in…

Computer Vision and Pattern Recognition · Computer Science 2019-07-26 Zhiqiang Shen , Zhankui He , Xiangyang Xue

Adaptive Model Ensemble for Continual Learning

Model ensemble is an effective strategy in continual learning, which alleviates catastrophic forgetting by interpolating model parameters, achieving knowledge fusion learned from different tasks. However, existing model ensemble methods…

Computer Vision and Pattern Recognition · Computer Science 2025-09-25 Yuchuan Mao , Zhi Gao , Xiaomeng Fan , Yuwei Wu , Yunde Jia , Chenchen Jing

Scaling Motion Forecasting Models with Ensemble Distillation

Motion forecasting has become an increasingly critical component of autonomous robotic systems. Onboard compute budgets typically limit the accuracy of real-time systems. In this work we propose methods of improving motion forecasting…

Robotics · Computer Science 2024-05-15 Scott Ettinger , Kratarth Goel , Avikalp Srivastava , Rami Al-Rfou

BERT Learns to Teach: Knowledge Distillation with Meta Learning

We present Knowledge Distillation with Meta Learning (MetaDistil), a simple yet effective alternative to traditional knowledge distillation (KD) methods where the teacher model is fixed during training. We show the teacher network can learn…

Machine Learning · Computer Science 2022-04-05 Wangchunshu Zhou , Canwen Xu , Julian McAuley

Unified and Effective Ensemble Knowledge Distillation

Ensemble knowledge distillation can extract knowledge from multiple teacher models and encode it into a single student model. Many existing methods learn and distill the student model on labeled data only. However, the teacher models are…

Machine Learning · Computer Science 2022-04-04 Chuhan Wu , Fangzhao Wu , Tao Qi , Yongfeng Huang

Context-Aware Ensemble Learning for Time Series

We investigate ensemble methods for prediction in an online setting. Unlike all the literature in ensembling, for the first time, we introduce a new approach using a meta learner that effectively combines the base model predictions via…

Machine Learning · Computer Science 2022-12-01 Arda Fazla , Mustafa Enes Aydin , Orhun Tamyigit , Suleyman Serdar Kozat

FEED: Feature-level Ensemble for Knowledge Distillation

Knowledge Distillation (KD) aims to transfer knowledge in a teacher-student framework, by providing the predictions of the teacher network to the student network in the training stage to help the student network generalize better. It can…

Computer Vision and Pattern Recognition · Computer Science 2019-09-25 SeongUk Park , Nojun Kwak

Ensemble Distillation for Neural Machine Translation

Knowledge distillation describes a method for training a student network to perform better by learning from a stronger teacher network. Translating a sentence with an Neural Machine Translation (NMT) engine is time expensive and having a…

Computation and Language · Computer Science 2017-08-09 Markus Freitag , Yaser Al-Onaizan , Baskaran Sankaran

Ensemble Distribution Distillation

Ensembles of models often yield improvements in system performance. These ensemble approaches have also been empirically shown to yield robust measures of uncertainty, and are capable of distinguishing between different \emph{forms} of…

Machine Learning · Statistics 2019-11-27 Andrey Malinin , Bruno Mlodozeniec , Mark Gales

Learning Lightweight Object Detectors via Multi-Teacher Progressive Distillation

Resource-constrained perception systems such as edge computing and vision-for-robotics require vision models to be both accurate and lightweight in computation and memory usage. While knowledge distillation is a proven strategy to enhance…

Computer Vision and Pattern Recognition · Computer Science 2023-08-21 Shengcao Cao , Mengtian Li , James Hays , Deva Ramanan , Yi-Xiong Wang , Liang-Yan Gui

Model Compression Using Optimal Transport

Model compression methods are important to allow for easier deployment of deep learning models in compute, memory and energy-constrained environments such as mobile phones. Knowledge distillation is a class of model compression algorithm…

Computer Vision and Pattern Recognition · Computer Science 2020-12-08 Suhas Lohit , Michael Jones

Uncertainty-Aware Dual-Student Knowledge Distillation for Efficient Image Classification

Knowledge distillation has emerged as a powerful technique for model compression, enabling the transfer of knowledge from large teacher networks to compact student models. However, traditional knowledge distillation methods treat all…

Computer Vision and Pattern Recognition · Computer Science 2025-11-25 Aakash Gore , Anoushka Dey , Aryan Mishra

Pacemaker: Intermediate Teacher Knowledge Distillation For On-The-Fly Convolutional Neural Network

There is a need for an on-the-fly computational process with very low performance system such as system-on-chip (SoC) and embedded device etc. This paper presents pacemaker knowledge distillation as intermediate ensemble teacher to use…

Computer Vision and Pattern Recognition · Computer Science 2020-03-10 Wonchul Son , Youngbin Kim , Wonseok Song , Youngsu Moon , Wonjun Hwang

Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning

We formally study how ensemble of deep learning models can improve test accuracy, and how the superior performance of ensemble can be distilled into a single model using knowledge distillation. We consider the challenging case where the…

Machine Learning · Computer Science 2023-02-16 Zeyuan Allen-Zhu , Yuanzhi Li