English
Related papers

Related papers: Light Multi-segment Activation for Model Compressi…

200 papers

Model compression becomes a recent trend due to the requirement of deploying neural networks on embedded and mobile devices. Hence, both accuracy and efficiency are of critical importance. To explore a balance between them, a knowledge…

Computer Vision and Pattern Recognition · Computer Science 2022-04-15 Zhiyuan Wu , Hong Qi , Yu Jiang , Minghao Zhao , Chupeng Cui , Zongmin Yang , Xinhui Xue

Knowledge distillation aims to compress a powerful yet cumbersome teacher model into a lightweight student model without much sacrifice of performance. For this purpose, various approaches have been proposed over the past few years,…

Computer Vision and Pattern Recognition · Computer Science 2022-03-29 Defang Chen , Jian-Ping Mei , Hailin Zhang , Can Wang , Yan Feng , Chun Chen

Deep pre-training and fine-tuning models (like BERT, OpenAI GPT) have demonstrated excellent results in question answering areas. However, due to the sheer amount of model parameters, the inference speed of these models is very slow. How to…

Computation and Language · Computer Science 2019-04-23 Ze Yang , Linjun Shou , Ming Gong , Wutao Lin , Daxin Jiang

Knowledge distillation is widely applied in various fundamental vision models to enhance the performance of compact models. Existing knowledge distillation methods focus on designing different distillation targets to acquire knowledge from…

Computer Vision and Pattern Recognition · Computer Science 2024-08-23 Yaoze Zhang , Yuming Zhang , Yu Zhao , Yue Zhang , Feiyu Zhu

Knowledge distillation (KD) is an effective model compression technique where a compact student network is taught to mimic the behavior of a complex and highly trained teacher network. In contrast, Mutual Learning (ML) provides an…

Computer Vision and Pattern Recognition · Computer Science 2021-10-25 Usma Niyaz , Deepti R. Bathula

Several methods of knowledge distillation have been developed for neural network compression. While they all use the KL divergence loss to align the soft outputs of the student model more closely with that of the teacher, the various…

Computer Vision and Pattern Recognition · Computer Science 2020-12-08 Huan Wang , Suhas Lohit , Michael Jones , Yun Fu

Model compression methods are important to allow for easier deployment of deep learning models in compute, memory and energy-constrained environments such as mobile phones. Knowledge distillation is a class of model compression algorithm…

Computer Vision and Pattern Recognition · Computer Science 2020-12-08 Suhas Lohit , Michael Jones

Compressing deep neural network (DNN) models becomes a very important and necessary technique for real-world applications, such as deploying those models on mobile devices. Knowledge distillation is one of the most popular methods for model…

Machine Learning · Computer Science 2020-03-02 Makoto Takamoto , Yusuke Morishita , Hitoshi Imaoka

Pre-trained language models (PLMs) achieve great success in NLP. However, their huge model sizes hinder their applications in many practical systems. Knowledge distillation is a popular technique to compress PLMs, which learns a small…

Computation and Language · Computer Science 2021-06-03 Chuhan Wu , Fangzhao Wu , Yongfeng Huang

In natural language processing (NLP) tasks, slow inference speed and huge footprints in GPU usage remain the bottleneck of applying pre-trained deep models in production. As a popular method for model compression, knowledge distillation…

Computation and Language · Computer Science 2020-12-15 Fei Yuan , Linjun Shou , Jian Pei , Wutao Lin , Ming Gong , Yan Fu , Daxin Jiang

Large Language Models (LLMs) have demonstrated outstanding performance across a range of NLP tasks, however, their computational demands hinder their deployment in real-world, resource-constrained environments. This work investigates the…

Computation and Language · Computer Science 2025-07-11 Joyeeta Datta , Niclas Doll , Qusai Ramadan , Zeyd Boukhers

Pre-trained language models such as BERT have proven to be highly effective for natural language processing (NLP) tasks. However, the high demand for computing resources in training such models hinders their application in practice. In…

Computation and Language · Computer Science 2019-08-27 Siqi Sun , Yu Cheng , Zhe Gan , Jingjing Liu

Deep pre-training and fine-tuning models (such as BERT and OpenAI GPT) have demonstrated excellent results in question answering areas. However, due to the sheer amount of model parameters, the inference speed of these models is very slow.…

Computation and Language · Computer Science 2019-10-21 Ze Yang , Linjun Shou , Ming Gong , Wutao Lin , Daxin Jiang

Large Language Models (LLMs) are highly accurate in classification tasks, however, substantial computational and financial costs hinder their large-scale deployment in dynamic environments. Knowledge Distillation (KD) where a LLM "teacher"…

Machine Learning · Computer Science 2025-11-18 Viviana Luccioli , Rithika Iyengar , Ryan Panley , Flora Haberkorn , Xiaoyu Ge , Leland Crane , Nitish Sinha , Seung Jung Lee

Knowledge distillation is a key technique for compressing large language models (LLMs), but most existing methods align representations at fixed layers or token-level outputs, ignoring how representations evolve across depth. As a result,…

Computation and Language · Computer Science 2026-05-05 Pham Khanh Chi , Quoc Phong Dao , Thuat Nguyen , Linh Ngo Van , Trung Le , Thanh Hong Nguyen

Pretrained language models have led to significant performance gains in many NLP tasks. However, the intensive computing resources to train such models remain an issue. Knowledge distillation alleviates this problem by learning a…

Computation and Language · Computer Science 2020-05-04 Linqing Liu , Huan Wang , Jimmy Lin , Richard Socher , Caiming Xiong

Deep neural network compression techniques such as pruning and weight tensor decomposition usually require fine-tuning to recover the prediction accuracy when the compression ratio is high. However, conventional fine-tuning suffers from the…

Machine Learning · Computer Science 2020-04-01 Tianhong Li , Jianguo Li , Zhuang Liu , Changshui Zhang

Transformer-based encoder-decoder models have achieved remarkable success in image-to-image transfer tasks, particularly in image restoration. However, their high computational complexity-manifested in elevated FLOPs and parameter…

Computer Vision and Pattern Recognition · Computer Science 2025-01-17 Yongheng Zhang , Danfeng Yan

Knowledge distillation is a potential solution for model compression. The idea is to make a small student network imitate the target of a large teacher network, then the student network can be competitive to the teacher one. Most previous…

Computer Vision and Pattern Recognition · Computer Science 2017-10-24 Chong Wang , Xipeng Lan , Yangang Zhang

This paper presents a novel knowledge distillation based model compression framework consisting of a student ensemble. It enables distillation of simultaneously learnt ensemble knowledge onto each of the compressed student models. Each…

Computer Vision and Pattern Recognition · Computer Science 2020-11-17 Devesh Walawalkar , Zhiqiang Shen , Marios Savvides
‹ Prev 1 2 3 10 Next ›