Related papers: Self-Knowledge Distillation in Natural Language Pr…

Self-Knowledge Distillation for Learning Ambiguity

Recent language models have shown remarkable performance on natural language understanding (NLU) tasks. However, they are often sub-optimal when faced with ambiguous samples that can be interpreted in multiple ways, over-confidently…

Computation and Language · Computer Science 2024-06-17 Hancheol Park , Soyeong Jeong , Sukmin Cho , Jong C. Park

Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding

This paper explores the use of knowledge distillation to improve a Multi-Task Deep Neural Network (MT-DNN) (Liu et al., 2019) for learning text representations across multiple natural language understanding tasks. Although ensemble learning…

Computation and Language · Computer Science 2019-04-23 Xiaodong Liu , Pengcheng He , Weizhu Chen , Jianfeng Gao

Progressive Label Distillation: Learning Input-Efficient Deep Neural Networks

Much of the focus in the area of knowledge distillation has been on distilling knowledge from a larger teacher network to a smaller student network. However, there has been little research on how the concept of distillation can be leveraged…

Neural and Evolutionary Computing · Computer Science 2019-01-29 Zhong Qiu Lin , Alexander Wong

Reinforced Multi-Teacher Selection for Knowledge Distillation

In natural language processing (NLP) tasks, slow inference speed and huge footprints in GPU usage remain the bottleneck of applying pre-trained deep models in production. As a popular method for model compression, knowledge distillation…

Computation and Language · Computer Science 2020-12-15 Fei Yuan , Linjun Shou , Jian Pei , Wutao Lin , Ming Gong , Yan Fu , Daxin Jiang

Self-Knowledge Distillation with Progressive Refinement of Targets

The generalization capability of deep neural networks has been substantially improved by applying a wide spectrum of regularization methods, e.g., restricting function space, injecting randomness during training, augmenting data, etc. In…

Machine Learning · Computer Science 2021-10-08 Kyungyul Kim , ByeongMoon Ji , Doyoung Yoon , Sangheum Hwang

A New Training Framework for Deep Neural Network

Knowledge distillation is the process of transferring the knowledge from a large model to a small model. In this process, the small model learns the generalization ability of the large model and retains the performance close to that of the…

Machine Learning · Computer Science 2021-03-26 Zhenyan Hou , Wenxuan Fan

Self-Knowledge Distillation via Dropout

To boost the performance, deep neural networks require deeper or wider network structures that involve massive computational and memory costs. To alleviate this issue, the self-knowledge distillation method regularizes the model by…

Computer Vision and Pattern Recognition · Computer Science 2022-08-12 Hyoje Lee , Yeachan Park , Hyun Seo , Myungjoo Kang

Knowledge Distillation Leveraging Alternative Soft Targets from Non-Parallel Qualified Speech Data

This paper describes a novel knowledge distillation framework that leverages acoustically qualified speech data included in an existing training data pool as privileged information. In our proposed framework, a student network is trained…

Sound · Computer Science 2021-12-17 Tohru Nagano , Takashi Fukuda , Gakuto Kurata

Knowledge Distillation: A Survey

In recent years, deep neural networks have been successful in both industry and academia, especially for computer vision tasks. The great success of deep learning is mainly due to its scalability to encode large-scale data and to maneuver…

Machine Learning · Computer Science 2021-05-21 Jianping Gou , Baosheng Yu , Stephen John Maybank , Dacheng Tao

Selective Knowledge Distillation for Neural Machine Translation

Neural Machine Translation (NMT) models achieve state-of-the-art performance on many translation benchmarks. As an active research field in NMT, knowledge distillation is widely applied to enhance the model's performance by transferring…

Computation and Language · Computer Science 2021-05-28 Fusheng Wang , Jianhao Yan , Fandong Meng , Jie Zhou

Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation

Convolutional neural networks have been widely deployed in various application scenarios. In order to extend the applications' boundaries to some accuracy-crucial domains, researchers have been investigating approaches to boost accuracy…

Machine Learning · Computer Science 2019-05-21 Linfeng Zhang , Jiebo Song , Anni Gao , Jingwei Chen , Chenglong Bao , Kaisheng Ma

Accelerating Large Scale Knowledge Distillation via Dynamic Importance Sampling

Knowledge distillation is an effective technique that transfers knowledge from a large teacher model to a shallow student. However, just like massive classification, large scale knowledge distillation also imposes heavy computational costs…

Machine Learning · Computer Science 2018-12-04 Minghan Li , Tanli Zuo , Ruicheng Li , Martha White , Weishi Zheng

Be Your Own Best Competitor! Multi-Branched Adversarial Knowledge Transfer

Deep neural network architectures have attained remarkable improvements in scene understanding tasks. Utilizing an efficient model is one of the most important constraints for limited-resource devices. Recently, several compression methods…

Computer Vision and Pattern Recognition · Computer Science 2020-10-12 Mahdi Ghorbani , Fahimeh Fooladgar , Shohreh Kasaei

Explaining Knowledge Distillation by Quantifying the Knowledge

This paper presents a method to interpret the success of knowledge distillation by quantifying and analyzing task-relevant and task-irrelevant visual concepts that are encoded in intermediate layers of a deep neural network (DNN). More…

Machine Learning · Computer Science 2020-03-26 Xu Cheng , Zhefan Rao , Yilan Chen , Quanshi Zhang

Sequence-Level Knowledge Distillation

Neural machine translation (NMT) offers a novel alternative formulation of translation that is potentially simpler than statistical approaches. However to reach competitive performance, NMT models need to be exceedingly large. In this paper…

Computation and Language · Computer Science 2016-09-23 Yoon Kim , Alexander M. Rush

Data Distillation for Text Classification

Deep learning techniques have achieved great success in many fields, while at the same time deep learning models are getting more complex and expensive to compute. It severely hinders the wide applications of these models. In order to…

Computation and Language · Computer Science 2021-04-20 Yongqi Li , Wenjie Li

Self-Referenced Deep Learning

Knowledge distillation is an effective approach to transferring knowledge from a teacher neural network to a student target network for satisfying the low-memory and fast running requirements in practice use. Whilst being able to create…

Computer Vision and Pattern Recognition · Computer Science 2018-11-20 Xu Lan , Xiatian Zhu , Shaogang Gong

Unraveling Key Factors of Knowledge Distillation

Knowledge distillation, a technique for model compression and performance enhancement, has gained significant traction in Neural Machine Translation (NMT). However, existing research primarily focuses on empirical applications, and there is…

Computation and Language · Computer Science 2023-12-27 Jingxuan Wei , Linzhuang Sun , Xu Tan , Bihui Yu , Ruifeng Guo

Distilling Word Embeddings: An Encoding Approach

Distilling knowledge from a well-trained cumbersome network to a small one has recently become a new research topic, as lightweight neural networks with high performance are particularly in need in various resource-restricted systems. This…

Computation and Language · Computer Science 2016-07-26 Lili Mou , Ran Jia , Yan Xu , Ge Li , Lu Zhang , Zhi Jin

Introspective Learning by Distilling Knowledge from Online Self-explanation

In recent years, many explanation methods have been proposed to explain individual classifications of deep neural networks. However, how to leverage the created explanations to improve the learning process has been less explored. As the…

Computer Vision and Pattern Recognition · Computer Science 2020-09-22 Jindong Gu , Zhiliang Wu , Volker Tresp