Related papers: SwitchOut: an Efficient Data Augmentation Algorith…

Syntax-aware Data Augmentation for Neural Machine Translation

Data augmentation is an effective performance enhancement in neural machine translation (NMT) by generating additional bilingual data. In this paper, we propose a novel data augmentation enhancement strategy for neural machine translation.…

Computation and Language · Computer Science 2020-04-30 Sufeng Duan , Hai Zhao , Dongdong Zhang , Rui Wang

Soft Contextual Data Augmentation for Neural Machine Translation

While data augmentation is an important trick to boost the accuracy of deep learning methods in computer vision tasks, its study in natural language tasks is still very limited. In this paper, we present a novel data augmentation method for…

Computation and Language · Computer Science 2019-05-28 Jinhua Zhu , Fei Gao , Lijun Wu , Yingce Xia , Tao Qin , Wengang Zhou , Xueqi Cheng , Tie-Yan Liu

AdMix: A Mixed Sample Data Augmentation Method for Neural Machine Translation

In Neural Machine Translation (NMT), data augmentation methods such as back-translation have proven their effectiveness in improving translation performance. In this paper, we propose a novel data augmentation approach for NMT, which is…

Computation and Language · Computer Science 2022-05-11 Chang Jin , Shigui Qiu , Nini Xiao , Hao Jia

Joint Dropout: Improving Generalizability in Low-Resource Neural Machine Translation through Phrase Pair Variables

Despite the tremendous success of Neural Machine Translation (NMT), its performance on low-resource language pairs still remains subpar, partly due to the limited ability to handle previously unseen inputs, i.e., generalization. In this…

Computation and Language · Computer Science 2023-07-25 Ali Araabi , Vlad Niculae , Christof Monz

Dynamic Data Selection for Neural Machine Translation

Intelligent selection of training data has proven a successful technique to simultaneously increase training efficiency and translation performance for phrase-based machine translation (PBMT). With the recent increase in popularity of…

Computation and Language · Computer Science 2017-08-03 Marlies van der Wees , Arianna Bisazza , Christof Monz

Sentence Concatenation Approach to Data Augmentation for Neural Machine Translation

Neural machine translation (NMT) has recently gained widespread attention because of its high translation accuracy. However, it shows poor performance in the translation of long sentences, which is a major issue in low-resource languages.…

Computation and Language · Computer Science 2021-04-20 Seiichiro Kondo , Kengo Hotate , Masahiro Kaneko , Mamoru Komachi

Data Augmentation for Low-Resource Neural Machine Translation

The quality of a Neural Machine Translation system depends substantially on the availability of sizable parallel corpora. For low-resource language pairs this is not the case, resulting in poor translation quality. Inspired by work in…

Computation and Language · Computer Science 2018-02-14 Marzieh Fadaee , Arianna Bisazza , Christof Monz

Memory-augmented Neural Machine Translation

Neural machine translation (NMT) has achieved notable success in recent times, however it is also widely recognized that this approach has limitations with handling infrequent words and word pairs. This paper presents a novel…

Computation and Language · Computer Science 2017-08-08 Yang Feng , Shiyue Zhang , Andi Zhang , Dong Wang , Andrew Abel

Data Diversification: A Simple Strategy For Neural Machine Translation

We introduce Data Diversification: a simple but effective strategy to boost neural machine translation (NMT) performance. It diversifies the training data by using the predictions of multiple forward and backward models and then merging…

Computation and Language · Computer Science 2020-10-06 Xuan-Phi Nguyen , Shafiq Joty , Wu Kui , Ai Ti Aw

Sequence-Level Mixed Sample Data Augmentation

Despite their empirical success, neural networks still have difficulty capturing compositional aspects of natural language. This work proposes a simple data augmentation approach to encourage compositional behavior in neural models for…

Computation and Language · Computer Science 2020-11-19 Demi Guo , Yoon Kim , Alexander M. Rush

Multi-Source Neural Machine Translation with Data Augmentation

Multi-source translation systems translate from multiple languages to a single target language. By using information from these multiple sources, these systems achieve large gains in accuracy. To train these systems, it is necessary to have…

Computation and Language · Computer Science 2018-11-09 Yuta Nishimura , Katsuhito Sudoh , Graham Neubig , Satoshi Nakamura

Bi-SimCut: A Simple Strategy for Boosting Neural Machine Translation

We introduce Bi-SimCut: a simple but effective training strategy to boost neural machine translation (NMT) performance. It consists of two procedures: bidirectional pretraining and unidirectional finetuning. Both procedures utilize SimCut,…

Computation and Language · Computer Science 2022-07-28 Pengzhi Gao , Zhongjun He , Hua Wu , Haifeng Wang

Few-shot learning through contextual data augmentation

Machine translation (MT) models used in industries with constantly changing topics, such as translation or news agencies, need to adapt to new data to maintain their performance over time. Our aim is to teach a pre-trained MT model to…

Computation and Language · Computer Science 2021-04-01 Farid Arthaud , Rachel Bawden , Alexandra Birch

Pre-Translation for Neural Machine Translation

Recently, the development of neural machine translation (NMT) has significantly improved the translation quality of automatic machine translation. While most sentences are more accurate and fluent than translations by statistical machine…

Computation and Language · Computer Science 2016-10-18 Jan Niehues , Eunah Cho , Thanh-Le Ha , Alex Waibel

Code-Switching for Enhancing NMT with Pre-Specified Translation

Leveraging user-provided translation to constrain NMT has practical significance. Existing methods can be classified into two main categories, namely the use of placeholder tags for lexicon words and the use of hard constraints during…

Computation and Language · Computer Science 2019-05-17 Kai Song , Yue Zhang , Heng Yu , Weihua Luo , Kun Wang , Min Zhang

Learning to Generalize to More: Continuous Semantic Augmentation for Neural Machine Translation

The principal task in supervised neural machine translation (NMT) is to learn to generate target sentences conditioned on the source inputs from a set of parallel sentence pairs, and thus produce a model capable of generalizing to unseen…

Computation and Language · Computer Science 2022-04-15 Xiangpeng Wei , Heng Yu , Yue Hu , Rongxiang Weng , Weihua Luo , Jun Xie , Rong Jin

Data Augmentation for Neural Machine Translation using Generative Language Model

Despite the rapid growth in model architecture, the scarcity of large parallel corpora remains the main bottleneck in Neural Machine Translation. Data augmentation is a technique that enhances the performance of data-hungry models by…

Computation and Language · Computer Science 2023-11-14 Seokjin Oh , Su Ah Lee , Woohwan Jung

Neural Machine Translation Model with a Large Vocabulary Selected by Branching Entropy

Neural machine translation (NMT), a new approach to machine translation, has achieved promising results comparable to those of traditional approaches such as statistical machine translation (SMT). Despite its recent success, NMT cannot…

Computation and Language · Computer Science 2017-09-07 Zi Long , Ryuichiro Kimura , Takehito Utsuro , Tomoharu Mitsuhashi , Mikio Yamamoto

Semantically Consistent Data Augmentation for Neural Machine Translation via Conditional Masked Language Model

This paper introduces a new data augmentation method for neural machine translation that can enforce stronger semantic consistency both within and across languages. Our method is based on Conditional Masked Language Model (CMLM) which is…

Computation and Language · Computer Science 2022-09-23 Qiao Cheng , Jin Huang , Yitao Duan

Iterative Mask Filling: An Effective Text Augmentation Method Using Masked Language Modeling

Data augmentation is an effective technique for improving the performance of machine learning models. However, it has not been explored as extensively in natural language processing (NLP) as it has in computer vision. In this paper, we…

Computation and Language · Computer Science 2024-01-04 Himmet Toprak Kesgin , Mehmet Fatih Amasyali