Related papers: Functional Ensemble Distillation

Diversity Matters When Learning From Ensembles

Deep ensembles excel in large-scale image classification tasks both in terms of prediction accuracy and calibration. Despite being simple to train, the computation and memory cost of deep ensembles limits their practicability. While some…

Machine Learning · Computer Science 2021-10-28 Giung Nam , Jongmin Yoon , Yoonho Lee , Juho Lee

Ensemble Distillation for Structured Prediction: Calibrated, Accurate, Fast-Choose Three

Modern neural networks do not always produce well-calibrated predictions, even when trained with a proper scoring function such as cross-entropy. In classification settings, simple methods such as isotonic regression or temperature scaling…

Machine Learning · Computer Science 2021-03-26 Steven Reich , David Mueller , Nicholas Andrews

Progressive Ensemble Distillation: Building Ensembles for Efficient Inference

We study the problem of progressive ensemble distillation: Given a large, pretrained teacher model $g$, we seek to decompose the model into smaller, low-inference cost student models $f_i$, such that progressively evaluating additional…

Machine Learning · Computer Science 2023-11-10 Don Kurian Dennis , Abhishek Shetty , Anish Sevekari , Kazuhito Koishida , Virginia Smith

Ensemble Distribution Distillation

Ensembles of models often yield improvements in system performance. These ensemble approaches have also been empirically shown to yield robust measures of uncertainty, and are capable of distinguishing between different \emph{forms} of…

Machine Learning · Statistics 2019-11-27 Andrey Malinin , Bruno Mlodozeniec , Mark Gales

Hydra: Preserving Ensemble Diversity for Model Distillation

Ensembles of models have been empirically shown to improve predictive performance and to yield robust measures of uncertainty. However, they are expensive in computation and memory. Therefore, recent research has focused on distilling…

Machine Learning · Computer Science 2021-03-22 Linh Tran , Bastiaan S. Veeling , Kevin Roth , Jakub Swiatkowski , Joshua V. Dillon , Jasper Snoek , Stephan Mandt , Tim Salimans , Sebastian Nowozin , Rodolphe Jenatton

Credal Ensemble Distillation for Uncertainty Quantification

Deep ensembles (DE) have emerged as a powerful approach for quantifying predictive uncertainty and distinguishing its aleatoric and epistemic components, thereby enhancing model robustness and reliability. However, their high computational…

Machine Learning · Computer Science 2025-11-19 Kaizheng Wang , Fabio Cuzzolin , David Moens , Hans Hallez

A general framework for ensemble distribution distillation

Ensembles of neural networks have been shown to give better performance than single networks, both in terms of predictions and uncertainty estimation. Additionally, ensembles allow the uncertainty to be decomposed into aleatoric (data) and…

Machine Learning · Statistics 2021-01-11 Jakob Lindqvist , Amanda Olmin , Fredrik Lindsten , Lennart Svensson

Self-Distribution Distillation: Efficient Uncertainty Estimation

Deep learning is increasingly being applied in safety-critical domains. For these scenarios it is important to know the level of uncertainty in a model's prediction to ensure appropriate decisions are made by the system. Deep ensembles are…

Machine Learning · Computer Science 2022-03-17 Yassir Fathullah , Mark J. F. Gales

Ensemble Distillation for Robust Model Fusion in Federated Learning

Federated Learning (FL) is a machine learning setting where many devices collaboratively train a machine learning model while keeping the training data decentralized. In most of the current training schemes the central model is refined by…

Machine Learning · Computer Science 2021-03-30 Tao Lin , Lingjing Kong , Sebastian U. Stich , Martin Jaggi

Efficient Ensemble Model Generation for Uncertainty Estimation with Bayesian Approximation in Segmentation

Recent studies have shown that ensemble approaches could not only improve accuracy and but also estimate model uncertainty in deep learning. However, it requires a large number of parameters according to the increase of ensemble models for…

Computer Vision and Pattern Recognition · Computer Science 2020-05-25 Hong Joo Lee , Seong Tae Kim , Hakmin Lee , Nassir Navab , Yong Man Ro

Anti-Distillation: Improving reproducibility of deep networks

Deep networks have been revolutionary in improving performance of machine learning and artificial intelligence systems. Their high prediction accuracy, however, comes at a price of \emph{model irreproducibility\/} in very high levels that…

Machine Learning · Computer Science 2020-10-21 Gil I. Shamir , Lorenzo Coviello

Uncertainty quantification is a critical aspect of reinforcement learning and deep learning, with numerous applications ranging from efficient exploration and stable offline reinforcement learning to outlier detection in medical…

Machine Learning · Computer Science 2025-03-27 Moritz A. Zanger , Pascal R. Van der Vaart , Wendelin Böhmer , Matthijs T. J. Spaan

Uncertainty-Aware Retinal Vessel Segmentation via Ensemble Distillation

Uncertainty estimation is critical for reliable medical image segmentation, particularly in retinal vessel analysis, where accurate predictions are essential for diagnostic applications. Deep Ensembles, where multiple networks are trained…

Computer Vision and Pattern Recognition · Computer Science 2025-09-16 Jeremiah Fadugba , Petru Manescu , Bolanle Oladejo , Delmiro Fernandez-Reyes , Philipp Berens

Feature Distillation is the Better Choice for Model-Heterogeneous Federated Learning

Model-Heterogeneous Federated Learning (Hetero-FL) has attracted growing attention for its ability to aggregate knowledge from heterogeneous models while keeping private data locally. To better aggregate knowledge from clients, ensemble…

Machine Learning · Computer Science 2025-10-15 Yichen Li , Xiuying Wang , Wenchao Xu , Haozhao Wang , Yining Qi , Jiahua Dong , Ruixuan Li

FEED: Feature-level Ensemble for Knowledge Distillation

Knowledge Distillation (KD) aims to transfer knowledge in a teacher-student framework, by providing the predictions of the teacher network to the student network in the training stage to help the student network generalize better. It can…

Computer Vision and Pattern Recognition · Computer Science 2019-09-25 SeongUk Park , Nojun Kwak

Ensemble Distillation Approaches for Grammatical Error Correction

Ensemble approaches are commonly used techniques to improving a system by combining multiple model predictions. Additionally these schemes allow the uncertainty, as well as the source of the uncertainty, to be derived for the prediction.…

Computation and Language · Computer Science 2020-12-16 Yassir Fathullah , Mark Gales , Andrey Malinin

Scaling Motion Forecasting Models with Ensemble Distillation

Motion forecasting has become an increasingly critical component of autonomous robotic systems. Onboard compute budgets typically limit the accuracy of real-time systems. In this work we propose methods of improving motion forecasting…

Robotics · Computer Science 2024-05-15 Scott Ettinger , Kratarth Goel , Avikalp Srivastava , Rami Al-Rfou

Fed-ensemble: Improving Generalization through Model Ensembling in Federated Learning

In this paper we propose Fed-ensemble: a simple approach that bringsmodel ensembling to federated learning (FL). Instead of aggregating localmodels to update a single global model, Fed-ensemble uses random permutations to update a group of…

Machine Learning · Statistics 2023-07-04 Naichen Shi , Fan Lai , Raed Al Kontar , Mosharaf Chowdhury

Unified and Effective Ensemble Knowledge Distillation

Ensemble knowledge distillation can extract knowledge from multiple teacher models and encode it into a single student model. Many existing methods learn and distill the student model on labeled data only. However, the teacher models are…

Machine Learning · Computer Science 2022-04-04 Chuhan Wu , Fangzhao Wu , Tao Qi , Yongfeng Huang

Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning

We formally study how ensemble of deep learning models can improve test accuracy, and how the superior performance of ensemble can be distilled into a single model using knowledge distillation. We consider the challenging case where the…

Machine Learning · Computer Science 2023-02-16 Zeyuan Allen-Zhu , Yuanzhi Li