English
Related papers

Related papers: CILDA: Contrastive Data Augmentation using Interme…

200 papers

Knowledge distillation (KD) is a highly promising method for mitigating the computational problems of pre-trained language models (PLMs). Among various KD approaches, Intermediate Layer Distillation (ILD) has been a de facto standard KD…

Computation and Language · Computer Science 2023-02-06 Jongwoo Ko , Seungjoon Park , Minchan Jeong , Sukjin Hong , Euijai Ahn , Du-Seong Chang , Se-Young Yun

Knowledge Distillation (KD) is a model compression algorithm that helps transfer the knowledge of a large neural network into a smaller one. Even though KD has shown promise on a wide range of Natural Language Processing (NLP) applications,…

Computation and Language · Computer Science 2021-09-21 Tianda Li , Ahmad Rashid , Aref Jafari , Pranav Sharma , Ali Ghodsi , Mehdi Rezagholizadeh

Knowledge Distillation (KD) for object detection aims to train a compact detector by transferring knowledge from a teacher model. Since the teacher model perceives data in a way different from humans, existing KD methods only distill…

Computer Vision and Pattern Recognition · Computer Science 2024-02-22 Jiawei Liang , Siyuan Liang , Aishan Liu , Ke Ma , Jingzhi Li , Xiaochun Cao

Large-scale language models have recently demonstrated impressive empirical performance. Nevertheless, the improved results are attained at the price of bigger models, more power consumption, and slower inference, which hinder their…

Computation and Language · Computer Science 2021-03-18 Kevin J Liang , Weituo Hao , Dinghan Shen , Yufan Zhou , Weizhu Chen , Changyou Chen , Lawrence Carin

Knowledge distillation (KD) is a technique for transferring knowledge from complex teacher models to simpler student models, significantly enhancing model efficiency and accuracy. It has demonstrated substantial advancements in various…

Computation and Language · Computer Science 2025-04-21 Junjie Yang , Junhao Song , Xudong Han , Ziqian Bi , Tianyang Wang , Chia Xin Liang , Xinyuan Song , Yichao Zhang , Qian Niu , Benji Peng , Keyu Chen , Ming Liu

Knowledge distillation (KD) is widely used for training a compact model with the supervision of another large model, which could effectively improve the performance. Previous methods mainly focus on two aspects: 1) training the student to…

Computer Vision and Pattern Recognition · Computer Science 2020-07-27 Tiancheng Wen , Shenqi Lai , Xueming Qian

The teacher-free online Knowledge Distillation (KD) aims to train an ensemble of multiple student models collaboratively and distill knowledge from each other. Although existing online KD methods achieve desirable performance, they often…

Computer Vision and Pattern Recognition · Computer Science 2023-03-28 Chuanguang Yang , Zhulin An , Helong Zhou , Fuzhen Zhuang , Yongjun Xu , Qian Zhan

Knowledge distillation (KD) is a valuable yet challenging approach that enhances a compact student network by learning from a high-performance but cumbersome teacher model. However, previous KD methods for image restoration overlook the…

Computer Vision and Pattern Recognition · Computer Science 2024-12-18 Yunshuai Zhou , Junbo Qiao , Jincheng Liao , Wei Li , Simiao Li , Jiao Xie , Yunhang Shen , Jie Hu , Shaohui Lin

Knowledge distillation (KD) is one of the prominent techniques for model compression. In this method, the knowledge of a large network (teacher) is distilled into a model (student) with usually significantly fewer parameters. KD tries to…

Machine Learning · Computer Science 2023-01-31 Aref Jafari , Mehdi Rezagholizadeh , Ali Ghodsi

Knowledge distillation (KD) has been widely used to transfer knowledge from large, accurate models (teachers) to smaller, efficient ones (students). Recent methods have explored enforcing consistency by incorporating causal interpretations…

Computer Vision and Pattern Recognition · Computer Science 2025-07-17 Nikolaos Giakoumoglou , Tania Stathaki

Knowledge distillation is a mainstream algorithm in model compression by transferring knowledge from the larger model (teacher) to the smaller model (student) to improve the performance of student. Despite many efforts, existing methods…

Computer Vision and Pattern Recognition · Computer Science 2024-10-21 Muhe Ding , Jianlong Wu , Xue Dong , Xiaojie Li , Pengda Qin , Tian Gan , Liqiang Nie

Recently, the advance in deep learning has brought a considerable improvement in the end-to-end speech recognition field, simplifying the traditional pipeline while producing promising results. Among the end-to-end models, the connectionist…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-29 Ji Won Yoon , Beom Jun Woo , Sunghwan Ahn , Hyeonseung Lee , Nam Soo Kim

Knowledge Distillation (KD) has been extensively used for natural language understanding (NLU) tasks to improve a small model's (a student) generalization by transferring the knowledge from a larger model (a teacher). Although KD methods…

Machine Learning · Computer Science 2022-12-13 Aref Jafari , Ivan Kobyzev , Mehdi Rezagholizadeh , Pascal Poupart , Ali Ghodsi

Knowledge Distillation (KD) aims to transfer knowledge from a large teacher model to a smaller student model. While contrastive learning has shown promise in self-supervised learning by creating discriminative representations, its…

Computer Vision and Pattern Recognition · Computer Science 2025-05-14 Nikolaos Giakoumoglou , Tania Stathaki

The advent of large pre-trained language models has given rise to rapid progress in the field of Natural Language Processing (NLP). While the performance of these models on standard benchmarks has scaled with size, compression techniques…

Computation and Language · Computer Science 2021-05-14 Ahmad Rashid , Vasileios Lioutas , Mehdi Rezagholizadeh

Knowledge distillation (KD) has become an important technique for model compression and knowledge transfer. In this work, we first perform a comprehensive analysis of the knowledge transferred by different KD methods. We demonstrate that…

Computer Vision and Pattern Recognition · Computer Science 2021-06-07 Fei Ding , Yin Yang , Hongxin Hu , Venkat Krovi , Feng Luo

Data-free knowledge distillation (DFKD) has emerged as a pivotal technique in the domain of model compression, substantially reducing the dependency on the original training data. Nonetheless, conventional DFKD methods that employ…

Computer Vision and Pattern Recognition · Computer Science 2024-10-24 Muquan Li , Dongyang Zhang , Tao He , Xiurui Xie , Yuan-Fang Li , Ke Qin

Mixup is a popular data augmentation technique based on creating new samples by linear interpolation between two given data samples, to improve both the generalization and robustness of the trained model. Knowledge distillation (KD), on the…

Computer Vision and Pattern Recognition · Computer Science 2022-11-10 Hongjun Choi , Eun Som Jeon , Ankita Shukla , Pavan Turaga

Knowledge distillation (KD) is one of the most potent ways for model compression. The key idea is to transfer the knowledge from a deep teacher model (T) to a shallower student (S). However, existing methods suffer from performance…

Machine Learning · Computer Science 2020-02-24 Mengya Gao , Yujun Shen , Quanquan Li , Chen Change Loy

Intermediate layer knowledge distillation (KD) can improve the standard KD technique (which only targets the output of teacher and student models) especially over large pre-trained language models. However, intermediate layer distillation…

Computation and Language · Computer Science 2021-10-05 Md Akmal Haidar , Nithin Anchuri , Mehdi Rezagholizadeh , Abbas Ghaddar , Philippe Langlais , Pascal Poupart
‹ Prev 1 2 3 10 Next ›