Related papers: Ensemble Knowledge Distillation for Machine Learni…

Ensemble Knowledge Distillation for Learning Improved and Efficient Networks

Ensemble models comprising of deep Convolutional Neural Networks (CNN) have shown significant improvements in model generalization but at the cost of large computation and memory requirements. In this paper, we present a framework for…

Computer Vision and Pattern Recognition · Computer Science 2020-04-03 Umar Asif , Jianbin Tang , Stefan Harrer

Learn From the Past: Experience Ensemble Knowledge Distillation

Traditional knowledge distillation transfers "dark knowledge" of a pre-trained teacher network to a student network, and ignores the knowledge in the training process of the teacher, which we call teacher's experience. However, in realistic…

Computer Vision and Pattern Recognition · Computer Science 2022-02-28 Chaofei Wang , Shaowei Zhang , Shiji Song , Gao Huang

Ensemble Knowledge Distillation for CTR Prediction

Recently, deep learning-based models have been widely studied for click-through rate (CTR) prediction and lead to improved prediction accuracy in many industrial applications. However, current research focuses primarily on building complex…

Machine Learning · Computer Science 2023-07-06 Jieming Zhu , Jinyang Liu , Weiqi Li , Jincai Lai , Xiuqiang He , Liang Chen , Zibin Zheng

Unified and Effective Ensemble Knowledge Distillation

Ensemble knowledge distillation can extract knowledge from multiple teacher models and encode it into a single student model. Many existing methods learn and distill the student model on labeled data only. However, the teacher models are…

Machine Learning · Computer Science 2022-04-04 Chuhan Wu , Fangzhao Wu , Tao Qi , Yongfeng Huang

Online Ensemble Model Compression using Knowledge Distillation

This paper presents a novel knowledge distillation based model compression framework consisting of a student ensemble. It enables distillation of simultaneously learnt ensemble knowledge onto each of the compressed student models. Each…

Computer Vision and Pattern Recognition · Computer Science 2020-11-17 Devesh Walawalkar , Zhiqiang Shen , Marios Savvides

Decoupled Knowledge with Ensemble Learning for Online Distillation

Offline distillation is a two-stage pipeline that requires expensive resources to train a teacher network and then distill the knowledge to a student for deployment. Online knowledge distillation, on the other hand, is a one-stage strategy…

Computer Vision and Pattern Recognition · Computer Science 2023-12-19 Baitan Shao , Ying Chen

Adaptive Multi-Teacher Knowledge Distillation with Meta-Learning

Multi-Teacher knowledge distillation provides students with additional supervision from multiple pre-trained teachers with diverse information sources. Most existing methods explore different weighting strategies to obtain a powerful…

Computer Vision and Pattern Recognition · Computer Science 2023-06-13 Hailin Zhang , Defang Chen , Can Wang

Adaptive Group Robust Ensemble Knowledge Distillation

Neural networks can learn spurious correlations in the data, often leading to performance degradation for underrepresented subgroups. Studies have demonstrated that the disparity is amplified when knowledge is distilled from a complex…

Machine Learning · Computer Science 2025-11-11 Patrik Kenfack , Ulrich Aïvodji , Samira Ebrahimi Kahou

Evolving Knowledge Distillation for Lightweight Neural Machine Translation

Recent advancements in Neural Machine Translation (NMT) have significantly improved translation quality. However, the increasing size and complexity of state-of-the-art models present significant challenges for deployment on…

Computation and Language · Computer Science 2026-05-12 Xuewen Zhang , Haixiao Zhang , Xinlong Huang

Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning

We formally study how ensemble of deep learning models can improve test accuracy, and how the superior performance of ensemble can be distilled into a single model using knowledge distillation. We consider the challenging case where the…

Machine Learning · Computer Science 2023-02-16 Zeyuan Allen-Zhu , Yuanzhi Li

Teacher-student training improves accuracy and efficiency of machine learning interatomic potentials

Machine learning interatomic potentials (MLIPs) are revolutionizing the field of molecular dynamics (MD) simulations. Recent MLIPs have tended towards more complex architectures trained on larger datasets. The resulting increase in…

Chemical Physics · Physics 2025-06-16 Sakib Matin , Alice E. A. Allen , Emily Shinkle , Aleksandra Pachalieva , Galen T. Craven , Benjamin Nebgen , Justin S. Smith , Richard Messerly , Ying Wai Li , Sergei Tretiak , Kipton Barros , Nicholas Lubbers

Ensemble knowledge distillation of self-supervised speech models

Distilled self-supervised models have shown competitive performance and efficiency in recent years. However, there is a lack of experience in jointly distilling multiple self-supervised speech models. In our work, we performed Ensemble…

Audio and Speech Processing · Electrical Eng. & Systems 2023-02-27 Kuan-Po Huang , Tzu-hsun Feng , Yu-Kuan Fu , Tsu-Yuan Hsu , Po-Chieh Yen , Wei-Cheng Tseng , Kai-Wei Chang , Hung-yi Lee

Augmenting Knowledge Distillation With Peer-To-Peer Mutual Learning For Model Compression

Knowledge distillation (KD) is an effective model compression technique where a compact student network is taught to mimic the behavior of a complex and highly trained teacher network. In contrast, Mutual Learning (ML) provides an…

Computer Vision and Pattern Recognition · Computer Science 2021-10-25 Usma Niyaz , Deepti R. Bathula

EGAD: Entropy-Guided Adaptive Distillation for Token-Level Knowledge Transfer

Large language models (LLMs) have achieved remarkable performance across diverse domains, yet their enormous computational and memory requirements hinder deployment in resource-constrained environments. Knowledge distillation offers a…

Computation and Language · Computer Science 2026-05-05 Hao Zhang , Zhibin Zhang , Guangxin Wu , Wanyi Ning , Jiafeng Guo , Xueqi Cheng

FEED: Feature-level Ensemble for Knowledge Distillation

Knowledge Distillation (KD) aims to transfer knowledge in a teacher-student framework, by providing the predictions of the teacher network to the student network in the training stage to help the student network generalize better. It can…

Computer Vision and Pattern Recognition · Computer Science 2019-09-25 SeongUk Park , Nojun Kwak

Multi-teacher knowledge distillation as an effective method for compressing ensembles of neural networks

Deep learning has contributed greatly to many successes in artificial intelligence in recent years. Today, it is possible to train models that have thousands of layers and hundreds of billions of parameters. Large-scale deep models have…

Machine Learning · Computer Science 2023-02-15 Konrad Zuchniak

CLIP-Embed-KD: Computationally Efficient Knowledge Distillation Using Embeddings as Teachers

Contrastive Language-Image Pre-training (CLIP) has been shown to improve zero-shot generalization capabilities of language and vision models. In this paper, we extend CLIP for efficient knowledge distillation, by utilizing embeddings as…

Machine Learning · Computer Science 2024-09-02 Lakshmi Nair

Knowledge Distillation via Weighted Ensemble of Teaching Assistants

Knowledge distillation in machine learning is the process of transferring knowledge from a large model called the teacher to a smaller model called the student. Knowledge distillation is one of the techniques to compress the large network…

Machine Learning · Computer Science 2022-06-27 Durga Prasad Ganta , Himel Das Gupta , Victor S. Sheng

Learning Together: Towards foundational models for machine learning interatomic potentials with meta-learning

The development of machine learning models has led to an abundance of datasets containing quantum mechanical (QM) calculations for molecular and material systems. However, traditional training methods for machine learning models are unable…

Chemical Physics · Physics 2023-07-11 Alice E. A. Allen , Nicholas Lubbers , Sakib Matin , Justin Smith , Richard Messerly , Sergei Tretiak , Kipton Barros

Multi-fidelity learning for interatomic potentials: Low-level forces and high-level energies are all you need

The promise of machine learning interatomic potentials (MLIPs) has led to an abundance of public quantum mechanical (QM) training datasets. The quality of an MLIP is directly limited by the accuracy of the energies and atomic forces in the…

Computational Physics · Physics 2025-09-23 Mitchell Messerly , Sakib Matin , Alice E. A. Allen , Benjamin Nebgen , Kipton Barros , Justin S. Smith , Nicholas Lubbers , Richard Messerly