English
Related papers

Related papers: Ensemble Knowledge Distillation for Machine Learni…

200 papers

Ensemble models comprising of deep Convolutional Neural Networks (CNN) have shown significant improvements in model generalization but at the cost of large computation and memory requirements. In this paper, we present a framework for…

Computer Vision and Pattern Recognition · Computer Science 2020-04-03 Umar Asif , Jianbin Tang , Stefan Harrer

Traditional knowledge distillation transfers "dark knowledge" of a pre-trained teacher network to a student network, and ignores the knowledge in the training process of the teacher, which we call teacher's experience. However, in realistic…

Computer Vision and Pattern Recognition · Computer Science 2022-02-28 Chaofei Wang , Shaowei Zhang , Shiji Song , Gao Huang

Recently, deep learning-based models have been widely studied for click-through rate (CTR) prediction and lead to improved prediction accuracy in many industrial applications. However, current research focuses primarily on building complex…

Machine Learning · Computer Science 2023-07-06 Jieming Zhu , Jinyang Liu , Weiqi Li , Jincai Lai , Xiuqiang He , Liang Chen , Zibin Zheng

Ensemble knowledge distillation can extract knowledge from multiple teacher models and encode it into a single student model. Many existing methods learn and distill the student model on labeled data only. However, the teacher models are…

Machine Learning · Computer Science 2022-04-04 Chuhan Wu , Fangzhao Wu , Tao Qi , Yongfeng Huang

This paper presents a novel knowledge distillation based model compression framework consisting of a student ensemble. It enables distillation of simultaneously learnt ensemble knowledge onto each of the compressed student models. Each…

Computer Vision and Pattern Recognition · Computer Science 2020-11-17 Devesh Walawalkar , Zhiqiang Shen , Marios Savvides

Offline distillation is a two-stage pipeline that requires expensive resources to train a teacher network and then distill the knowledge to a student for deployment. Online knowledge distillation, on the other hand, is a one-stage strategy…

Computer Vision and Pattern Recognition · Computer Science 2023-12-19 Baitan Shao , Ying Chen

Multi-Teacher knowledge distillation provides students with additional supervision from multiple pre-trained teachers with diverse information sources. Most existing methods explore different weighting strategies to obtain a powerful…

Computer Vision and Pattern Recognition · Computer Science 2023-06-13 Hailin Zhang , Defang Chen , Can Wang

Neural networks can learn spurious correlations in the data, often leading to performance degradation for underrepresented subgroups. Studies have demonstrated that the disparity is amplified when knowledge is distilled from a complex…

Machine Learning · Computer Science 2025-11-11 Patrik Kenfack , Ulrich Aïvodji , Samira Ebrahimi Kahou

Recent advancements in Neural Machine Translation (NMT) have significantly improved translation quality. However, the increasing size and complexity of state-of-the-art models present significant challenges for deployment on…

Computation and Language · Computer Science 2026-05-12 Xuewen Zhang , Haixiao Zhang , Xinlong Huang

We formally study how ensemble of deep learning models can improve test accuracy, and how the superior performance of ensemble can be distilled into a single model using knowledge distillation. We consider the challenging case where the…

Machine Learning · Computer Science 2023-02-16 Zeyuan Allen-Zhu , Yuanzhi Li

Machine learning interatomic potentials (MLIPs) are revolutionizing the field of molecular dynamics (MD) simulations. Recent MLIPs have tended towards more complex architectures trained on larger datasets. The resulting increase in…

Distilled self-supervised models have shown competitive performance and efficiency in recent years. However, there is a lack of experience in jointly distilling multiple self-supervised speech models. In our work, we performed Ensemble…

Audio and Speech Processing · Electrical Eng. & Systems 2023-02-27 Kuan-Po Huang , Tzu-hsun Feng , Yu-Kuan Fu , Tsu-Yuan Hsu , Po-Chieh Yen , Wei-Cheng Tseng , Kai-Wei Chang , Hung-yi Lee

Knowledge distillation (KD) is an effective model compression technique where a compact student network is taught to mimic the behavior of a complex and highly trained teacher network. In contrast, Mutual Learning (ML) provides an…

Computer Vision and Pattern Recognition · Computer Science 2021-10-25 Usma Niyaz , Deepti R. Bathula

Large language models (LLMs) have achieved remarkable performance across diverse domains, yet their enormous computational and memory requirements hinder deployment in resource-constrained environments. Knowledge distillation offers a…

Computation and Language · Computer Science 2026-05-05 Hao Zhang , Zhibin Zhang , Guangxin Wu , Wanyi Ning , Jiafeng Guo , Xueqi Cheng

Knowledge Distillation (KD) aims to transfer knowledge in a teacher-student framework, by providing the predictions of the teacher network to the student network in the training stage to help the student network generalize better. It can…

Computer Vision and Pattern Recognition · Computer Science 2019-09-25 SeongUk Park , Nojun Kwak

Deep learning has contributed greatly to many successes in artificial intelligence in recent years. Today, it is possible to train models that have thousands of layers and hundreds of billions of parameters. Large-scale deep models have…

Machine Learning · Computer Science 2023-02-15 Konrad Zuchniak

Contrastive Language-Image Pre-training (CLIP) has been shown to improve zero-shot generalization capabilities of language and vision models. In this paper, we extend CLIP for efficient knowledge distillation, by utilizing embeddings as…

Machine Learning · Computer Science 2024-09-02 Lakshmi Nair

Knowledge distillation in machine learning is the process of transferring knowledge from a large model called the teacher to a smaller model called the student. Knowledge distillation is one of the techniques to compress the large network…

Machine Learning · Computer Science 2022-06-27 Durga Prasad Ganta , Himel Das Gupta , Victor S. Sheng

The development of machine learning models has led to an abundance of datasets containing quantum mechanical (QM) calculations for molecular and material systems. However, traditional training methods for machine learning models are unable…

The promise of machine learning interatomic potentials (MLIPs) has led to an abundance of public quantum mechanical (QM) training datasets. The quality of an MLIP is directly limited by the accuracy of the energies and atomic forces in the…

‹ Prev 1 2 3 10 Next ›