English
Related papers

Related papers: Sequence-Level Mixed Sample Data Augmentation

200 papers

In Neural Machine Translation (NMT), data augmentation methods such as back-translation have proven their effectiveness in improving translation performance. In this paper, we propose a novel data augmentation approach for NMT, which is…

Computation and Language · Computer Science 2022-05-11 Chang Jin , Shigui Qiu , Nini Xiao , Hao Jia

Data augmentation is an effective approach to tackle over-fitting. Many previous works have proposed different data augmentations strategies for NLP, such as noise injection, word replacement, back-translation etc. Though effective, they…

Computation and Language · Computer Science 2022-07-13 Le Zhang , Zichao Yang , Diyi Yang

We propose a simple data augmentation protocol aimed at providing a compositional inductive bias in conditional and unconditional sequence models. Under this protocol, synthetic training examples are constructed by taking real training…

Computation and Language · Computer Science 2020-05-20 Jacob Andreas

Active learning is an important technique for low-resource sequence labeling tasks. However, current active sequence labeling methods use the queried samples alone in each iteration, which is an inefficient way of leveraging human…

Computation and Language · Computer Science 2020-10-07 Rongzhi Zhang , Yue Yu , Chao Zhang

For many new application domains for data-to-text generation, the main obstacle in training neural models consists of a lack of training data. While usually large numbers of instances are available on the data side, often only very few text…

Computation and Language · Computer Science 2021-02-09 Ernie Chang , Xiaoyu Shen , Dawei Zhu , Vera Demberg , Hui Su

Interpolation-based Data Augmentation (DA) methods (Mixup) linearly interpolate the inputs and labels of two or more training examples. Mixup has more recently been adapted to the field of Natural Language Processing (NLP), mainly for…

Computation and Language · Computer Science 2023-11-17 Yuxin Pei , Pushkar Bhuse , Zhengzhong Liu , Eric Xing

In this work, we examine methods for data augmentation for text-based tasks such as neural machine translation (NMT). We formulate the design of a data augmentation policy with desirable properties as an optimization problem, and derive a…

Computation and Language · Computer Science 2018-08-29 Xinyi Wang , Hieu Pham , Zihang Dai , Graham Neubig

Data augmentation improves the generalization power of deep learning models by synthesizing more training samples. Sample-mixing is a popular data augmentation approach that creates additional data by combining existing samples. Recent…

Computer Vision and Pattern Recognition · Computer Science 2024-03-20 Tsz-Him Cheung , Dit-Yan Yeung

Semantic segmentation using convolutional neural networks (CNN) is a crucial component in image analysis. Training a CNN to perform semantic segmentation requires a large amount of labeled data, where the production of such labeled data is…

Computer Vision and Pattern Recognition · Computer Science 2021-01-26 Ying Chen , Xu Ouyang , Kaiyue Zhu , Gady Agam

The quality of a Neural Machine Translation system depends substantially on the availability of sizable parallel corpora. For low-resource language pairs this is not the case, resulting in poor translation quality. Inspired by work in…

Computation and Language · Computer Science 2018-02-14 Marzieh Fadaee , Arianna Bisazza , Christof Monz

A mixed sample data augmentation strategy is proposed to enhance the performance of models on audio scene classification, sound event classification, and speech enhancement tasks. While there have been several augmentation methods shown to…

Sound · Computer Science 2021-08-09 Gwantae Kim , David K. Han , Hanseok Ko

The state of the art in semantic segmentation is steadily increasing in performance, resulting in more precise and reliable segmentations in many different applications. However, progress is limited by the cost of generating labels for…

Computer Vision and Pattern Recognition · Computer Science 2020-12-01 Viktor Olsson , Wilhelm Tranheden , Juliano Pinto , Lennart Svensson

End-to-end Speech Translation is hindered by a lack of available data resources. While most of them are based on documents, a sentence-level version is available, which is however single and static, potentially impeding the usefulness of…

Computation and Language · Computer Science 2023-11-02 Ioannis Tsiamas , José A. R. Fonollosa , Marta R. Costa-jussà

End-to-end speech translation relies on data that pair source-language speech inputs with corresponding translations into a target language. Such data are notoriously scarce, making synthetic data augmentation by back-translation or…

Computation and Language · Computer Science 2023-06-12 Tsz Kin Lam , Shigehiko Schamoni , Stefan Riezler

Compositional generalization, the ability to predict complex meanings from training on simpler sentences, poses challenges for powerful pretrained seq2seq models. In this paper, we show that data augmentation methods that sample MRs and…

Computation and Language · Computer Science 2024-01-19 Yuekun Yao , Alexander Koller

Despite the rapid growth in model architecture, the scarcity of large parallel corpora remains the main bottleneck in Neural Machine Translation. Data augmentation is a technique that enhances the performance of data-hungry models by…

Computation and Language · Computer Science 2023-11-14 Seokjin Oh , Su Ah Lee , Woohwan Jung

Sequence to sequence (SEQ2SEQ) models often lack diversity in their generated translations. This can be attributed to the limitation of SEQ2SEQ models in capturing lexical and syntactic variations in a parallel corpus resulting from…

Computation and Language · Computer Science 2018-10-18 Xuanli He , Gholamreza Haffari , Mohammad Norouzi

Data augmentation methods for Natural Language Processing tasks are explored in recent years, however they are limited and it is hard to capture the diversity on sentence level. Besides, it is not always possible to perform data…

Computation and Language · Computer Science 2022-05-20 M. Şafak Bilici , Mehmet Fatih Amasyali

Flexible neural sequence models outperform grammar- and automaton-based counterparts on a variety of tasks. However, neural models perform poorly in settings requiring compositional generalization beyond the training data -- particularly to…

Computation and Language · Computer Science 2021-06-09 Ekin Akyürek , Afra Feyza Akyürek , Jacob Andreas

Semi-supervised semantic segmentation has witnessed remarkable advancements in recent years. However, existing algorithms are based on convolutional neural networks and directly applying them to Vision Transformers poses certain limitations…

Computer Vision and Pattern Recognition · Computer Science 2025-03-11 Dengke Zhang , Quan Tang , Fagui Liu , Haiqing Mei , C. L. Philip Chen
‹ Prev 1 2 3 10 Next ›