Related papers: Semi-Autoregressive Training Improves Mask-Predict…

Mask-Predict: Parallel Decoding of Conditional Masked Language Models

Most machine translation systems generate text autoregressively from left to right. We, instead, use a masked language modeling objective to train a model to predict any subset of the target words, conditioned on both the input text and a…

Computation and Language · Computer Science 2019-09-05 Marjan Ghazvininejad , Omer Levy , Yinhan Liu , Luke Zettlemoyer

Semi-Autoregressive Neural Machine Translation

Existing approaches to neural machine translation are typically autoregressive models. While these models attain state-of-the-art translation quality, they are suffering from low parallelizability and thus slow at decoding long sequences.…

Computation and Language · Computer Science 2018-10-30 Chunqi Wang , Ji Zhang , Haiqing Chen

Improving Non-autoregressive Generation with Mixup Training

While pre-trained language models have achieved great success on various natural language understanding tasks, how to effectively leverage them into non-autoregressive generation tasks remains a challenge. To solve this problem, we present…

Computation and Language · Computer Science 2021-10-22 Ting Jiang , Shaohan Huang , Zihan Zhang , Deqing Wang , Fuzhen Zhuang , Furu Wei , Haizhen Huang , Liangjie Zhang , Qi Zhang

Infusing Sequential Information into Conditional Masked Translation Model with Self-Review Mechanism

Non-autoregressive models generate target words in a parallel way, which achieve a faster decoding speed but at the sacrifice of translation accuracy. To remedy a flawed translation by non-autoregressive models, a promising approach is to…

Computation and Language · Computer Science 2020-10-27 Pan Xie , Zhi Cui , Xiuyin Chen , Xiaohui Hu , Jianwei Cui , Bin Wang

Hint-Based Training for Non-Autoregressive Machine Translation

Due to the unparallelizable nature of the autoregressive factorization, AutoRegressive Translation (ART) models have to generate tokens sequentially during decoding and thus suffer from high inference latency. Non-AutoRegressive Translation…

Computation and Language · Computer Science 2019-09-17 Zhuohan Li , Zi Lin , Di He , Fei Tian , Tao Qin , Liwei Wang , Tie-Yan Liu

Incorporating a Local Translation Mechanism into Non-autoregressive Translation

In this work, we introduce a novel local autoregressive translation (LAT) mechanism into non-autoregressive translation (NAT) models so as to capture local dependencies among tar-get outputs. Specifically, for each target decoding position,…

Computation and Language · Computer Science 2020-11-13 Xiang Kong , Zhisong Zhang , Eduard Hovy

Masked Non-Autoregressive Image Captioning

Existing captioning models often adopt the encoder-decoder architecture, where the decoder uses autoregressive decoding to generate captions, such that each token is generated sequentially given the preceding generated tokens. However,…

Computer Vision and Pattern Recognition · Computer Science 2019-06-04 Junlong Gao , Xi Meng , Shiqi Wang , Xia Li , Shanshe Wang , Siwei Ma , Wen Gao

Non-Autoregressive Translation by Learning Target Categorical Codes

Non-autoregressive Transformer is a promising text generation model. However, current non-autoregressive models still fall behind their autoregressive counterparts in translation quality. We attribute this accuracy gap to the lack of…

Computation and Language · Computer Science 2021-03-23 Yu Bao , Shujian Huang , Tong Xiao , Dongqi Wang , Xinyu Dai , Jiajun Chen

PALM: Pre-training an Autoencoding&Autoregressive Language Model for Context-conditioned Generation

Self-supervised pre-training, such as BERT, MASS and BART, has emerged as a powerful technique for natural language understanding and generation. Existing pre-training techniques employ autoencoding and/or autoregressive objectives to train…

Computation and Language · Computer Science 2020-09-22 Bin Bi , Chenliang Li , Chen Wu , Ming Yan , Wei Wang , Songfang Huang , Fei Huang , Luo Si

MDM-ASR: Bridging Accuracy and Efficiency in ASR with Diffusion-Based Non-Autoregressive Decoding

In sequence-to-sequence Transformer ASR, autoregressive (AR) models achieve strong accuracy but suffer from slow decoding, while non-autoregressive (NAR) models enable parallel decoding at the cost of degraded performance. We propose a…

Audio and Speech Processing · Electrical Eng. & Systems 2026-02-26 Hao Yen , Pin-Jui Ku , Ante Jukić , Sabato Marco Siniscalchi

Exploring Unsupervised Pretraining Objectives for Machine Translation

Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT), by drastically reducing the need for large parallel data. Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence…

Computation and Language · Computer Science 2021-06-11 Christos Baziotis , Ivan Titov , Alexandra Birch , Barry Haddow

Study of Semi-supervised Approaches to Improving English-Mandarin Code-Switching Speech Recognition

In this paper, we present our overall efforts to improve the performance of a code-switching speech recognition system using semi-supervised training methods from lexicon learning to acoustic modeling, on the South East Asian…

Computation and Language · Computer Science 2018-06-19 Pengcheng Guo , Haihua Xu , Lei Xie , Eng Siong Chng

Exploring Stochastic Autoregressive Image Modeling for Visual Representation

Autoregressive language modeling (ALM) have been successfully used in self-supervised pre-training in Natural language processing (NLP). However, this paradigm has not achieved comparable results with other self-supervised approach in…

Computer Vision and Pattern Recognition · Computer Science 2022-12-06 Yu Qi , Fan Yang , Yousong Zhu , Yufei Liu , Liwei Wu , Rui Zhao , Wei Li

Universal Conditional Masked Language Pre-training for Neural Machine Translation

Pre-trained sequence-to-sequence models have significantly improved Neural Machine Translation (NMT). Different from prior works where pre-trained models usually adopt an unidirectional decoder, this paper demonstrates that pre-training a…

Computation and Language · Computer Science 2022-06-03 Pengfei Li , Liangyou Li , Meng Zhang , Minghao Wu , Qun Liu

Improving the Lexical Ability of Pretrained Language Models for Unsupervised Neural Machine Translation

Successful methods for unsupervised neural machine translation (UNMT) employ crosslingual pretraining via self-supervision, often in the form of a masked language modeling or a sequence generation task, which requires the model to align the…

Computation and Language · Computer Science 2021-04-15 Alexandra Chronopoulou , Dario Stojanovski , Alexander Fraser

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

We propose to pre-train a unified language model for both autoencoding and partially autoregressive language modeling tasks using a novel training procedure, referred to as a pseudo-masked language model (PMLM). Given an input text with…

Computation and Language · Computer Science 2020-03-02 Hangbo Bao , Li Dong , Furu Wei , Wenhui Wang , Nan Yang , Xiaodong Liu , Yu Wang , Songhao Piao , Jianfeng Gao , Ming Zhou , Hsiao-Wuen Hon

Inference Strategies for Machine Translation with Conditional Masking

Conditional masked language model (CMLM) training has proven successful for non-autoregressive and semi-autoregressive sequence generation tasks, such as machine translation. Given a trained CMLM, however, it is not clear what the best…

Computation and Language · Computer Science 2020-10-21 Julia Kreutzer , George Foster , Colin Cherry

Guiding Non-Autoregressive Neural Machine Translation Decoding with Reordering Information

Non-autoregressive neural machine translation (NAT) generates each target word in parallel and has achieved promising inference acceleration. However, existing NAT models still have a big gap in translation quality compared to…

Computation and Language · Computer Science 2020-12-17 Qiu Ran , Yankai Lin , Peng Li , Jie Zhou

Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input

Non-autoregressive translation (NAT) models, which remove the dependence on previous target tokens from the inputs of the decoder, achieve significantly inference speedup but at the cost of inferior accuracy compared to autoregressive…

Computation and Language · Computer Science 2018-12-27 Junliang Guo , Xu Tan , Di He , Tao Qin , Linli Xu , Tie-Yan Liu

Improving Language Model Integration for Neural Machine Translation

The integration of language models for neural machine translation has been extensively studied in the past. It has been shown that an external language model, trained on additional target-side monolingual data, can help improve translation…

Computation and Language · Computer Science 2023-06-09 Christian Herold , Yingbo Gao , Mohammad Zeineldeen , Hermann Ney