English

Quality Estimation without Human-labeled Data

Computation and Language 2021-02-09 v1

Abstract

Quality estimation aims to measure the quality of translated content without access to a reference translation. This is crucial for machine translation systems in real-world scenarios where high-quality translation is needed. While many approaches exist for quality estimation, they are based on supervised machine learning requiring costly human labelled data. As an alternative, we propose a technique that does not rely on examples from human-annotators and instead uses synthetic training data. We train off-the-shelf architectures for supervised quality estimation on our synthetic data and show that the resulting models achieve comparable performance to models trained on human-annotated data, both for sentence and word-level prediction.

Keywords

Cite

@article{arxiv.2102.04020,
  title  = {Quality Estimation without Human-labeled Data},
  author = {Yi-Lin Tuan and Ahmed El-Kishky and Adithya Renduchintala and Vishrav Chaudhary and Francisco Guzmán and Lucia Specia},
  journal= {arXiv preprint arXiv:2102.04020},
  year   = {2021}
}

Comments

Accepted by EACL2021