Related papers: Improving Neural Topic Models using Knowledge Dist…

Improving Neural Topic Models with Wasserstein Knowledge Distillation

Topic modeling is a dominant method for exploring document collections on the web and in digital libraries. Recent approaches to topic modeling use pretrained contextualized language models and variational autoencoders. However, large…

Computation and Language · Computer Science 2024-06-21 Suman Adhya , Debarshi Kumar Sanyal

Improved Knowledge Distillation for Pre-trained Language Models via Knowledge Selection

Knowledge distillation addresses the problem of transferring knowledge from a teacher model to a student model. In this process, we typically have multiple types of knowledge extracted from the teacher model. The problem is to make full use…

Computation and Language · Computer Science 2023-02-02 Chenglong Wang , Yi Lu , Yongyu Mu , Yimin Hu , Tong Xiao , Jingbo Zhu

Distilling Model Knowledge

Top-performing machine learning systems, such as deep neural networks, large ensembles and complex probabilistic graphical models, can be expensive to store, slow to evaluate and hard to integrate into larger systems. Ideally, we would like…

Machine Learning · Statistics 2015-10-09 George Papamakarios

Continual Knowledge Distillation for Neural Machine Translation

While many parallel corpora are not publicly accessible for data copyright, data privacy and competitive differentiation reasons, trained translation models are increasingly available on open platforms. In this work, we propose a method…

Computation and Language · Computer Science 2023-06-13 Yuanchi Zhang , Peng Li , Maosong Sun , Yang Liu

Building a Multi-domain Neural Machine Translation Model using Knowledge Distillation

Lack of specialized data makes building a multi-domain neural machine translation tool challenging. Although emerging literature dealing with low resource languages starts to show promising results, most state-of-the-art models used…

Computation and Language · Computer Science 2020-04-17 Idriss Mghabbar , Pirashanth Ratnamogan

On the Orthogonality of Knowledge Distillation with Other Techniques: From an Ensemble Perspective

To put a state-of-the-art neural network to practical use, it is necessary to design a model that has a good trade-off between the resource consumption and performance on the test set. Many researchers and engineers are developing methods…

Machine Learning · Computer Science 2020-09-15 SeongUk Park , KiYoon Yoo , Nojun Kwak

Selective Knowledge Distillation for Neural Machine Translation

Neural Machine Translation (NMT) models achieve state-of-the-art performance on many translation benchmarks. As an active research field in NMT, knowledge distillation is widely applied to enhance the model's performance by transferring…

Computation and Language · Computer Science 2021-05-28 Fusheng Wang , Jianhao Yan , Fandong Meng , Jie Zhou

Collective Wisdom: Improving Low-resource Neural Machine Translation using Adaptive Knowledge Distillation

Scarcity of parallel sentence-pairs poses a significant hurdle for training high-quality Neural Machine Translation (NMT) models in bilingually low-resource scenarios. A standard approach is transfer learning, which involves taking a model…

Computation and Language · Computer Science 2020-10-13 Fahimeh Saleh , Wray Buntine , Gholamreza Haffari

Knowledge Distillation with Training Wheels

Knowledge distillation is used, in generative language modeling, to train a smaller student model using the help of a larger teacher model, resulting in improved capabilities for the student model. In this paper, we formulate a more general…

Computation and Language · Computer Science 2025-02-26 Guanlin Liu , Anand Ramachandran , Tanmay Gangwani , Yan Fu , Abhinav Sethy

Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence

Topic models extract groups of words from documents, whose interpretation as a topic hopefully allows for a better understanding of the data. However, the resulting word groups are often not coherent, making them harder to interpret.…

Computation and Language · Computer Science 2021-06-18 Federico Bianchi , Silvia Terragni , Dirk Hovy

Knowledge Distillation in Deep Learning and its Applications

Deep learning based models are relatively large, and it is hard to deploy such models on resource-limited devices such as mobile phones and embedded devices. One possible solution is knowledge distillation whereby a smaller model (student…

Machine Learning · Computer Science 2021-05-21 Abdolmaged Alkhulaifi , Fahad Alsahli , Irfan Ahmad

Distilling Knowledge for Search-based Structured Prediction

Many natural language processing tasks can be modeled into structured prediction and solved as a search problem. In this paper, we distill an ensemble of multiple models trained with different initialization into a single model. In addition…

Computation and Language · Computer Science 2018-05-30 Yijia Liu , Wanxiang Che , Huaipeng Zhao , Bing Qin , Ting Liu

Using Knowledge Distillation to improve interpretable models in a retail banking context

This article sets forth a review of knowledge distillation techniques with a focus on their applicability to retail banking contexts. Predictive machine learning algorithms used in banking environments, especially in risk and control…

Machine Learning · Computer Science 2022-10-03 Maxime Biehler , Mohamed Guermazi , Célim Starck

Distilling Linguistic Context for Language Model Compression

A computationally expensive and memory intensive neural network lies behind the recent success of language representation learning. Knowledge distillation, a major technique for deploying such a vast language model in resource-scarce…

Computation and Language · Computer Science 2021-09-20 Geondo Park , Gyeongman Kim , Eunho Yang

Knowledge Distillation: A Survey

In recent years, deep neural networks have been successful in both industry and academia, especially for computer vision tasks. The great success of deep learning is mainly due to its scalability to encode large-scale data and to maneuver…

Machine Learning · Computer Science 2021-05-21 Jianping Gou , Baosheng Yu , Stephen John Maybank , Dacheng Tao

Towards a Unified View of Affinity-Based Knowledge Distillation

Knowledge transfer between artificial neural networks has become an important topic in deep learning. Among the open questions are what kind of knowledge needs to be preserved for the transfer, and how it can be effectively achieved.…

Computer Vision and Pattern Recognition · Computer Science 2022-10-03 Vladimir Li , Atsuto Maki

Unraveling Key Factors of Knowledge Distillation

Knowledge distillation, a technique for model compression and performance enhancement, has gained significant traction in Neural Machine Translation (NMT). However, existing research primarily focuses on empirical applications, and there is…

Computation and Language · Computer Science 2023-12-27 Jingxuan Wei , Linzhuang Sun , Xu Tan , Bihui Yu , Ruifeng Guo

Distillation of neural network models for detection and description of key points of images

Image matching and classification methods, as well as synchronous location and mapping, are widely used on embedded and mobile devices. Their most resource-intensive part is the detection and description of the key points of the images. And…

Computer Vision and Pattern Recognition · Computer Science 2020-06-19 A. V. Yashchenko , A. V. Belikov , M. V. Peterson , A. S. Potapov

On the Impact of Knowledge Distillation for Model Interpretability

Several recent studies have elucidated why knowledge distillation (KD) improves model performance. However, few have researched the other advantages of KD in addition to its improving model performance. In this study, we have attempted to…

Machine Learning · Computer Science 2023-05-26 Hyeongrok Han , Siwon Kim , Hyun-Soo Choi , Sungroh Yoon

Improving the Interpretability of Deep Neural Networks with Knowledge Distillation

Deep Neural Networks have achieved huge success at a wide spectrum of applications from language modeling, computer vision to speech recognition. However, nowadays, good performance alone is not sufficient to satisfy the needs of practical…

Machine Learning · Computer Science 2018-12-31 Xuan Liu , Xiaoguang Wang , Stan Matwin