Related papers: Sequence-Level Knowledge Distillation

Selective Knowledge Distillation for Neural Machine Translation

Neural Machine Translation (NMT) models achieve state-of-the-art performance on many translation benchmarks. As an active research field in NMT, knowledge distillation is widely applied to enhance the model's performance by transferring…

Computation and Language · Computer Science 2021-05-28 Fusheng Wang , Jianhao Yan , Fandong Meng , Jie Zhou

Ensemble Distillation for Neural Machine Translation

Knowledge distillation describes a method for training a student network to perform better by learning from a stronger teacher network. Translating a sentence with an Neural Machine Translation (NMT) engine is time expensive and having a…

Computation and Language · Computer Science 2017-08-09 Markus Freitag , Yaser Al-Onaizan , Baskaran Sankaran

Collective Wisdom: Improving Low-resource Neural Machine Translation using Adaptive Knowledge Distillation

Scarcity of parallel sentence-pairs poses a significant hurdle for training high-quality Neural Machine Translation (NMT) models in bilingually low-resource scenarios. A standard approach is transfer learning, which involves taking a model…

Computation and Language · Computer Science 2020-10-13 Fahimeh Saleh , Wray Buntine , Gholamreza Haffari

An Empirical Study of Leveraging Knowledge Distillation for Compressing Multilingual Neural Machine Translation Models

Knowledge distillation (KD) is a well-known method for compressing neural models. However, works focusing on distilling knowledge from large multilingual neural machine translation (MNMT) models into smaller ones are practically…

Computation and Language · Computer Science 2023-04-20 Varun Gumma , Raj Dabre , Pratyush Kumar

Unraveling Key Factors of Knowledge Distillation

Knowledge distillation, a technique for model compression and performance enhancement, has gained significant traction in Neural Machine Translation (NMT). However, existing research primarily focuses on empirical applications, and there is…

Computation and Language · Computer Science 2023-12-27 Jingxuan Wei , Linzhuang Sun , Xu Tan , Bihui Yu , Ruifeng Guo

Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers

With the growth of computing power neural machine translation (NMT) models also grow accordingly and become better. However, they also become harder to deploy on edge devices due to memory constraints. To cope with this problem, a common…

Computation and Language · Computer Science 2020-10-08 Yimeng Wu , Peyman Passban , Mehdi Rezagholizade , Qun Liu

Selective Knowledge Distillation for Non-Autoregressive Neural Machine Translation

Benefiting from the sequence-level knowledge distillation, the Non-Autoregressive Transformer (NAT) achieves great success in neural machine translation tasks. However, existing knowledge distillation has side effects, such as propagating…

Computation and Language · Computer Science 2023-08-07 Min Liu , Yu Bao , Chengqi Zhao , Shujian Huang

Weight Distillation: Transferring the Knowledge in Neural Network Parameters

Knowledge distillation has been proven to be effective in model acceleration and compression. It allows a small network to learn to generalize in the same way as a large network. Recent successes in pre-training suggest the effectiveness of…

Computation and Language · Computer Science 2021-07-20 Ye Lin , Yanyang Li , Ziyang Wang , Bei Li , Quan Du , Tong Xiao , Jingbo Zhu

Generation-Distillation for Efficient Natural Language Understanding in Low-Data Settings

Over the past year, the emergence of transfer learning with large-scale language models (LM) has led to dramatic performance improvements across a broad range of natural language understanding tasks. However, the size and memory footprint…

Computation and Language · Computer Science 2020-02-04 Luke Melas-Kyriazi , George Han , Celine Liang

Building a Multi-domain Neural Machine Translation Model using Knowledge Distillation

Lack of specialized data makes building a multi-domain neural machine translation tool challenging. Although emerging literature dealing with low resource languages starts to show promising results, most state-of-the-art models used…

Computation and Language · Computer Science 2020-04-17 Idriss Mghabbar , Pirashanth Ratnamogan

Evolving Knowledge Distillation for Lightweight Neural Machine Translation

Recent advancements in Neural Machine Translation (NMT) have significantly improved translation quality. However, the increasing size and complexity of state-of-the-art models present significant challenges for deployment on…

Computation and Language · Computer Science 2026-05-12 Xuewen Zhang , Haixiao Zhang , Xinlong Huang

Sentence-Level or Token-Level? A Comprehensive Study on Knowledge Distillation

Knowledge distillation, transferring knowledge from a teacher model to a student model, has emerged as a powerful technique in neural machine translation for compressing models or simplifying training targets. Knowledge distillation…

Computation and Language · Computer Science 2024-04-24 Jingxuan Wei , Linzhuang Sun , Yichong Leng , Xu Tan , Bihui Yu , Ruifeng Guo

Life-long Learning for Multilingual Neural Machine Translation with Knowledge Distillation

A common scenario of Multilingual Neural Machine Translation (MNMT) is that each translation task arrives in a sequential manner, and the training data of previous tasks is unavailable. In this scenario, the current methods suffer heavily…

Computation and Language · Computer Science 2022-12-07 Yang Zhao , Junnan Zhu , Lu Xiang , Jiajun Zhang , Yu Zhou , Feifei Zhai , Chengqing Zong

Distill, Adapt, Distill: Training Small, In-Domain Models for Neural Machine Translation

We explore best practices for training small, memory efficient machine translation models with sequence-level knowledge distillation in the domain adaptation setting. While both domain adaptation and knowledge distillation are widely-used,…

Computation and Language · Computer Science 2020-06-24 Mitchell A. Gordon , Kevin Duh

Nearest Neighbor Knowledge Distillation for Neural Machine Translation

k-nearest-neighbor machine translation (NN-MT), proposed by Khandelwal et al. (2021), has achieved many state-of-the-art results in machine translation tasks. Although effective, NN-MT requires conducting NN searches through the large…

Computation and Language · Computer Science 2022-05-03 Zhixian Yang , Renliang Sun , Xiaojun Wan

Understanding Knowledge Distillation in Non-autoregressive Machine Translation

Non-autoregressive machine translation (NAT) systems predict a sequence of output tokens in parallel, achieving substantial improvements in generation speed compared to autoregressive models. Existing NAT models usually rely on the…

Computation and Language · Computer Science 2021-02-24 Chunting Zhou , Graham Neubig , Jiatao Gu

Learning Light-Weight Translation Models from Deep Transformer

Recently, deep models have shown tremendous improvements in neural machine translation (NMT). However, systems of this kind are computationally expensive and memory intensive. In this paper, we take a natural step towards learning strong…

Computation and Language · Computer Science 2020-12-29 Bei Li , Ziyang Wang , Hui Liu , Quan Du , Tong Xiao , Chunliang Zhang , Jingbo Zhu

Knowledge Distillation For Recurrent Neural Network Language Modeling With Trust Regularization

Recurrent Neural Networks (RNNs) have dominated language modeling because of their superior performance over traditional N-gram based models. In many applications, a large Recurrent Neural Network language model (RNNLM) or an ensemble of…

Computation and Language · Computer Science 2019-04-09 Yangyang Shi , Mei-Yuh Hwang , Xin Lei , Haoyu Sheng

Improved Knowledge Distillation for Pre-trained Language Models via Knowledge Selection

Knowledge distillation addresses the problem of transferring knowledge from a teacher model to a student model. In this process, we typically have multiple types of knowledge extracted from the teacher model. The problem is to make full use…

Computation and Language · Computer Science 2023-02-02 Chenglong Wang , Yi Lu , Yongyu Mu , Yimin Hu , Tong Xiao , Jingbo Zhu

Multilingual Neural Machine Translation with Knowledge Distillation

Multilingual machine translation, which translates multiple languages with a single model, has attracted much attention due to its efficiency of offline training and online serving. However, traditional multilingual translation usually…

Computation and Language · Computer Science 2019-05-01 Xu Tan , Yi Ren , Di He , Tao Qin , Zhou Zhao , Tie-Yan Liu