English
Related papers

Related papers: A Parallel Evaluation Data Set of Software Documen…

200 papers

Parallel datasets are vital for performing and evaluating any kind of multilingual task. However, in the cases where one of the considered language pairs is a low-resource language, the existing top-down parallel data such as corpora are…

Computation and Language · Computer Science 2023-09-26 Kasun Wickramasinghe , Nisansa de Silva

This paper presents a high-quality multilingual dataset for the documentation domain to advance research on localization of structured text. Unlike widely-used datasets for translation of plain text, we collect XML-structured parallel text…

Computation and Language · Computer Science 2020-06-25 Kazuma Hashimoto , Raffaella Buschiazzo , James Bradbury , Teresa Marshall , Richard Socher , Caiming Xiong

In this paper, we present our work on the creation of lexical resources for the Machine Translation between English and Hindi. We describes the development of phrase pair mappings for our experiments and the comparative performance…

Computation and Language · Computer Science 2017-11-13 Sreelekha S , Pushpak Bhattacharyya

Document-level machine translation conditions on surrounding sentences to produce coherent translations. There has been much recent work in this area with the introduction of custom model architectures and decoding algorithms. This paper…

Computation and Language · Computer Science 2021-01-28 Zhiyi Ma , Sergey Edunov , Michael Auli

The creation of a quality summarization dataset is an expensive, time-consuming effort, requiring the production and evaluation of summaries by both trained humans and machines. If such effort is made in one language, it would be beneficial…

Computation and Language · Computer Science 2021-12-09 Spencer Braun , Oleg Vasilyev , Neslihan Iskender , John Bohannon

The primary objective of our work is to build a large-scale English-Thai dataset for machine translation. We construct an English-Thai machine translation dataset with over 1 million segment pairs, curated from various sources, namely news,…

Computation and Language · Computer Science 2021-08-10 Lalita Lowphansirikul , Charin Polpanumas , Attapol T. Rutherford , Sarana Nutanong

Sentence-level (SL) machine translation (MT) has reached acceptable quality for many high-resourced languages, but not document-level (DL) MT, which is difficult to 1) train with little amount of DL data; and 2) evaluate, as the main…

Computation and Language · Computer Science 2020-12-14 Matīss Rikters , Ryokan Ri , Tong Li , Toshiaki Nakazawa

Machine translation has become a critical tool in bridging linguistic gaps, especially between languages as diverse as English and Hindi. This paper comprehensively evaluates various machine translation models for translating between…

Computation and Language · Computer Science 2025-05-27 Ahan Prasannakumar Shetty

Compared to English, the amount of labeled data for Indonesian text classification tasks is very small. Recently developed multilingual language models have shown its ability to create multilingual representations effectively. This paper…

Computation and Language · Computer Science 2020-09-15 Ilham Firdausi Putra , Ayu Purwarianti

Data curation is a critical yet under-researched step in the machine translation training paradigm. To train translation systems, data acquisition relies primarily on human translations and digital parallel sources or, to a limited degree,…

Computation and Language · Computer Science 2026-03-12 Saumitra Yadav , Manish Shrivastava

Most legal text in the Indian judiciary is written in complex English due to historical reasons. However, only a small fraction of the Indian population is comfortable in reading English. Hence legal text needs to be made available in…

Computation and Language · Computer Science 2024-11-08 Sayan Mahapatra , Debtanu Datta , Shubham Soni , Adrijit Goswami , Saptarshi Ghosh

Multilingual sentence representations pose a great advantage for low-resource languages that do not have enough data to build monolingual models on their own. These multilingual sentence representations have been separately exploited by few…

Computation and Language · Computer Science 2021-06-15 Dilan Sachintha , Lakmali Piyarathna , Charith Rajitha , Surangika Ranathunga

Despite the known limitations, most machine translation systems today still operate on the sentence-level. One reason for this is, that most parallel training data is only sentence-level aligned, without document-level meta information…

Computation and Language · Computer Science 2023-10-20 Frithjof Petrick , Christian Herold , Pavel Petrushkov , Shahram Khadivi , Hermann Ney

For machine translation, a vast majority of language pairs in the world are considered low-resource because they have little parallel data available. Besides the technical challenges of learning with limited supervision, it is difficult to…

Computation and Language · Computer Science 2019-09-17 Francisco Guzmán , Peng-Jen Chen , Myle Ott , Juan Pino , Guillaume Lample , Philipp Koehn , Vishrav Chaudhary , Marc'Aurelio Ranzato

We consider the problem of translating high-level textual descriptions to formal representations in technical documentation as part of an effort to model the meaning of such documentation. We focus specifically on the problem of learning…

Computation and Language · Computer Science 2017-09-18 Kyle Richardson , Jonas Kuhn

We present research towards bridging the language gap between migrant workers in Qatar and medical staff. In particular, we present the first steps towards the development of a real-world Hindi-English machine translation system for…

Computation and Language · Computer Science 2016-10-11 Ahmad Musleh , Nadir Durrani , Irina Temnikova , Preslav Nakov , Stephan Vogel , Osama Alsaad

Translating source code from one programming language to another is a critical, time-consuming task in modernizing legacy applications and codebases. Recent work in this space has drawn inspiration from the software naturalness hypothesis…

It is well-known that document context is vital for resolving a range of translation ambiguities, and in fact the document setting is the most natural setting for nearly all translation. It is therefore unfortunate that machine translation…

Computation and Language · Computer Science 2024-05-17 Matt Post , Marcin Junczys-Dowmunt

Parallel corpora play an important role in training machine translation (MT) models, particularly for low-resource languages where high-quality bilingual data is scarce. This review provides a comprehensive overview of available parallel…

Computation and Language · Computer Science 2025-04-23 Rahul Raja , Arpita Vats

Several recent papers claim human parity at sentence-level Machine Translation (MT), especially in high-resource languages. Thus, in response, the MT community has, in part, shifted its focus to document-level translation. Translating…

Computation and Language · Computer Science 2023-05-19 Yuchen Eleanor Jiang , Tianyu Liu , Shuming Ma , Dongdong Zhang , Mrinmaya Sachan , Ryan Cotterell
‹ Prev 1 2 3 10 Next ›