Simulated Multiple Reference Training Improves Low-Resource Machine Translation

Huda Khayrallah; Brian Thompson; Matt Post; Philipp Koehn

doi:10.18653/v1/2020.emnlp-main.7

Simulated Multiple Reference Training Improves Low-Resource Machine Translation

Computation and Language 2021-04-23 v2

Authors: Huda Khayrallah , Brian Thompson , Matt Post , Philipp Koehn

View on arXiv ↗ PDF ↗ DOI ↗

Abstract

Many valid translations exist for a given sentence, yet machine translation (MT) is trained with a single reference translation, exacerbating data sparsity in low-resource settings. We introduce Simulated Multiple Reference Training (SMRT), a novel MT training method that approximates the full space of possible translations by sampling a paraphrase of the reference sentence from a paraphraser and training the MT model to predict the paraphraser's distribution over possible tokens. We demonstrate the effectiveness of SMRT in low-resource settings when translating to English, with improvements of 1.2 to 7.0 BLEU. We also find SMRT is complementary to back-translation.

Keywords

neural machine translation machine translation cross-lingual transfer

Cite

@article{arxiv.2004.14524,
  title  = {Simulated Multiple Reference Training Improves Low-Resource Machine Translation},
  author = {Huda Khayrallah and Brian Thompson and Matt Post and Philipp Koehn},
  journal= {arXiv preprint arXiv:2004.14524},
  year   = {2021}
}

Comments

EMNLP 2020 camera ready

Simulated Multiple Reference Training Improves Low-Resource Machine Translation

Abstract

Keywords

Cite

Comments

Related papers