English

Semi-Autoregressive Training Improves Mask-Predict Decoding

Computation and Language 2020-01-27 v1 Machine Learning Machine Learning

Abstract

The recently proposed mask-predict decoding algorithm has narrowed the performance gap between semi-autoregressive machine translation models and the traditional left-to-right approach. We introduce a new training method for conditional masked language models, SMART, which mimics the semi-autoregressive behavior of mask-predict, producing training examples that contain model predictions as part of their inputs. Models trained with SMART produce higher-quality translations when using mask-predict decoding, effectively closing the remaining performance gap with fully autoregressive models.

Keywords

Cite

@article{arxiv.2001.08785,
  title  = {Semi-Autoregressive Training Improves Mask-Predict Decoding},
  author = {Marjan Ghazvininejad and Omer Levy and Luke Zettlemoyer},
  journal= {arXiv preprint arXiv:2001.08785},
  year   = {2020}
}
R2 v1 2026-06-23T13:19:22.448Z