English
Related papers

Related papers: Meta-Ensemble Parameter Learning

200 papers

This paper presents a novel knowledge distillation based model compression framework consisting of a student ensemble. It enables distillation of simultaneously learnt ensemble knowledge onto each of the compressed student models. Each…

Computer Vision and Pattern Recognition · Computer Science 2020-11-17 Devesh Walawalkar , Zhiqiang Shen , Marios Savvides

Knowledge distillation has been proven to be effective in model acceleration and compression. It allows a small network to learn to generalize in the same way as a large network. Recent successes in pre-training suggest the effectiveness of…

Computation and Language · Computer Science 2021-07-20 Ye Lin , Yanyang Li , Ziyang Wang , Bei Li , Quan Du , Tong Xiao , Jingbo Zhu

Knowledge distillation in machine learning is the process of transferring knowledge from a large model called the teacher to a smaller model called the student. Knowledge distillation is one of the techniques to compress the large network…

Machine Learning · Computer Science 2022-06-27 Durga Prasad Ganta , Himel Das Gupta , Victor S. Sheng

Ensembles of deep neural networks have demonstrated superior performance, but their heavy computational cost hinders applying them for resource-limited environments. It motivates distilling knowledge from the ensemble teacher into a smaller…

Machine Learning · Computer Science 2022-07-01 Giung Nam , Hyungi Lee , Byeongho Heo , Juho Lee

Ensemble models comprising of deep Convolutional Neural Networks (CNN) have shown significant improvements in model generalization but at the cost of large computation and memory requirements. In this paper, we present a framework for…

Computer Vision and Pattern Recognition · Computer Science 2020-04-03 Umar Asif , Jianbin Tang , Stefan Harrer

Deep learning has contributed greatly to many successes in artificial intelligence in recent years. Today, it is possible to train models that have thousands of layers and hundreds of billions of parameters. Large-scale deep models have…

Machine Learning · Computer Science 2023-02-15 Konrad Zuchniak

Often the best performing deep neural models are ensembles of multiple base-level networks. Unfortunately, the space required to store these many networks, and the time required to execute them at test-time, prohibits their use in…

Computer Vision and Pattern Recognition · Computer Science 2019-07-26 Zhiqiang Shen , Zhankui He , Xiangyang Xue

Model ensemble is an effective strategy in continual learning, which alleviates catastrophic forgetting by interpolating model parameters, achieving knowledge fusion learned from different tasks. However, existing model ensemble methods…

Computer Vision and Pattern Recognition · Computer Science 2025-09-25 Yuchuan Mao , Zhi Gao , Xiaomeng Fan , Yuwei Wu , Yunde Jia , Chenchen Jing

Motion forecasting has become an increasingly critical component of autonomous robotic systems. Onboard compute budgets typically limit the accuracy of real-time systems. In this work we propose methods of improving motion forecasting…

Robotics · Computer Science 2024-05-15 Scott Ettinger , Kratarth Goel , Avikalp Srivastava , Rami Al-Rfou

We present Knowledge Distillation with Meta Learning (MetaDistil), a simple yet effective alternative to traditional knowledge distillation (KD) methods where the teacher model is fixed during training. We show the teacher network can learn…

Machine Learning · Computer Science 2022-04-05 Wangchunshu Zhou , Canwen Xu , Julian McAuley

Ensemble knowledge distillation can extract knowledge from multiple teacher models and encode it into a single student model. Many existing methods learn and distill the student model on labeled data only. However, the teacher models are…

Machine Learning · Computer Science 2022-04-04 Chuhan Wu , Fangzhao Wu , Tao Qi , Yongfeng Huang

We investigate ensemble methods for prediction in an online setting. Unlike all the literature in ensembling, for the first time, we introduce a new approach using a meta learner that effectively combines the base model predictions via…

Machine Learning · Computer Science 2022-12-01 Arda Fazla , Mustafa Enes Aydin , Orhun Tamyigit , Suleyman Serdar Kozat

Knowledge Distillation (KD) aims to transfer knowledge in a teacher-student framework, by providing the predictions of the teacher network to the student network in the training stage to help the student network generalize better. It can…

Computer Vision and Pattern Recognition · Computer Science 2019-09-25 SeongUk Park , Nojun Kwak

Knowledge distillation describes a method for training a student network to perform better by learning from a stronger teacher network. Translating a sentence with an Neural Machine Translation (NMT) engine is time expensive and having a…

Computation and Language · Computer Science 2017-08-09 Markus Freitag , Yaser Al-Onaizan , Baskaran Sankaran

Ensembles of models often yield improvements in system performance. These ensemble approaches have also been empirically shown to yield robust measures of uncertainty, and are capable of distinguishing between different \emph{forms} of…

Machine Learning · Statistics 2019-11-27 Andrey Malinin , Bruno Mlodozeniec , Mark Gales

Resource-constrained perception systems such as edge computing and vision-for-robotics require vision models to be both accurate and lightweight in computation and memory usage. While knowledge distillation is a proven strategy to enhance…

Computer Vision and Pattern Recognition · Computer Science 2023-08-21 Shengcao Cao , Mengtian Li , James Hays , Deva Ramanan , Yi-Xiong Wang , Liang-Yan Gui

Model compression methods are important to allow for easier deployment of deep learning models in compute, memory and energy-constrained environments such as mobile phones. Knowledge distillation is a class of model compression algorithm…

Computer Vision and Pattern Recognition · Computer Science 2020-12-08 Suhas Lohit , Michael Jones

Knowledge distillation has emerged as a powerful technique for model compression, enabling the transfer of knowledge from large teacher networks to compact student models. However, traditional knowledge distillation methods treat all…

Computer Vision and Pattern Recognition · Computer Science 2025-11-25 Aakash Gore , Anoushka Dey , Aryan Mishra

There is a need for an on-the-fly computational process with very low performance system such as system-on-chip (SoC) and embedded device etc. This paper presents pacemaker knowledge distillation as intermediate ensemble teacher to use…

Computer Vision and Pattern Recognition · Computer Science 2020-03-10 Wonchul Son , Youngbin Kim , Wonseok Song , Youngsu Moon , Wonjun Hwang

We formally study how ensemble of deep learning models can improve test accuracy, and how the superior performance of ensemble can be distilled into a single model using knowledge distillation. We consider the challenging case where the…

Machine Learning · Computer Science 2023-02-16 Zeyuan Allen-Zhu , Yuanzhi Li
‹ Prev 1 2 3 10 Next ›