Related papers: Sequence-Level Mixed Sample Data Augmentation

AdMix: A Mixed Sample Data Augmentation Method for Neural Machine Translation

In Neural Machine Translation (NMT), data augmentation methods such as back-translation have proven their effectiveness in improving translation performance. In this paper, we propose a novel data augmentation approach for NMT, which is…

Computation and Language · Computer Science 2022-05-11 Chang Jin , Shigui Qiu , Nini Xiao , Hao Jia

TreeMix: Compositional Constituency-based Data Augmentation for Natural Language Understanding

Data augmentation is an effective approach to tackle over-fitting. Many previous works have proposed different data augmentations strategies for NLP, such as noise injection, word replacement, back-translation etc. Though effective, they…

Computation and Language · Computer Science 2022-07-13 Le Zhang , Zichao Yang , Diyi Yang

Good-Enough Compositional Data Augmentation

We propose a simple data augmentation protocol aimed at providing a compositional inductive bias in conditional and unconditional sequence models. Under this protocol, synthetic training examples are constructed by taking real training…

Computation and Language · Computer Science 2020-05-20 Jacob Andreas

SeqMix: Augmenting Active Sequence Labeling via Sequence Mixup

Active learning is an important technique for low-resource sequence labeling tasks. However, current active sequence labeling methods use the queried samples alone in each iteration, which is an inefficient way of leveraging human…

Computation and Language · Computer Science 2020-10-07 Rongzhi Zhang , Yue Yu , Chao Zhang

Neural Data-to-Text Generation with LM-based Text Augmentation

For many new application domains for data-to-text generation, the main obstacle in training neural models consists of a lack of training data. While usually large numbers of instances are available on the data side, often only very few text…

Computation and Language · Computer Science 2021-02-09 Ernie Chang , Xiaoyu Shen , Dawei Zhu , Vera Demberg , Hui Su

SegMix: A Simple Structure-Aware Data Augmentation Method

Interpolation-based Data Augmentation (DA) methods (Mixup) linearly interpolate the inputs and labels of two or more training examples. Mixup has more recently been adapted to the field of Natural Language Processing (NLP), mainly for…

Computation and Language · Computer Science 2023-11-17 Yuxin Pei , Pushkar Bhuse , Zhengzhong Liu , Eric Xing

SwitchOut: an Efficient Data Augmentation Algorithm for Neural Machine Translation

In this work, we examine methods for data augmentation for text-based tasks such as neural machine translation (NMT). We formulate the design of a data augmentation policy with desirable properties as an optimization problem, and derive a…

Computation and Language · Computer Science 2018-08-29 Xinyi Wang , Hieu Pham , Zihang Dai , Graham Neubig

TransformMix: Learning Transformation and Mixing Strategies from Data

Data augmentation improves the generalization power of deep learning models by synthesizing more training samples. Sample-mixing is a popular data augmentation approach that creates additional data by combining existing samples. Recent…

Computer Vision and Pattern Recognition · Computer Science 2024-03-20 Tsz-Him Cheung , Dit-Yan Yeung

Mask-based Data Augmentation for Semi-supervised Semantic Segmentation

Semantic segmentation using convolutional neural networks (CNN) is a crucial component in image analysis. Training a CNN to perform semantic segmentation requires a large amount of labeled data, where the production of such labeled data is…

Computer Vision and Pattern Recognition · Computer Science 2021-01-26 Ying Chen , Xu Ouyang , Kaiyue Zhu , Gady Agam

Data Augmentation for Low-Resource Neural Machine Translation

The quality of a Neural Machine Translation system depends substantially on the availability of sizable parallel corpora. For low-resource language pairs this is not the case, resulting in poor translation quality. Inspired by work in…

Computation and Language · Computer Science 2018-02-14 Marzieh Fadaee , Arianna Bisazza , Christof Monz

SpecMix : A Mixed Sample Data Augmentation method for Training withTime-Frequency Domain Features

A mixed sample data augmentation strategy is proposed to enhance the performance of models on audio scene classification, sound event classification, and speech enhancement tasks. While there have been several augmentation methods shown to…

Sound · Computer Science 2021-08-09 Gwantae Kim , David K. Han , Hanseok Ko

ClassMix: Segmentation-Based Data Augmentation for Semi-Supervised Learning

The state of the art in semantic segmentation is steadily increasing in performance, resulting in more precise and reliable segmentations in many different applications. However, progress is limited by the cost of generating labels for…

Computer Vision and Pattern Recognition · Computer Science 2020-12-01 Viktor Olsson , Wilhelm Tranheden , Juliano Pinto , Lennart Svensson

SegAugment: Maximizing the Utility of Speech Translation Data with Segmentation-based Augmentations

End-to-end Speech Translation is hindered by a lack of available data resources. While most of them are based on documents, a sentence-level version is available, which is however single and static, potentially impeding the usefulness of…

Computation and Language · Computer Science 2023-11-02 Ioannis Tsiamas , José A. R. Fonollosa , Marta R. Costa-jussà

Sample, Translate, Recombine: Leveraging Audio Alignments for Data Augmentation in End-to-end Speech Translation

End-to-end speech translation relies on data that pair source-language speech inputs with corresponding translations into a target language. Such data are notoriously scarce, making synthetic data augmentation by back-translation or…

Computation and Language · Computer Science 2023-06-12 Tsz Kin Lam , Shigehiko Schamoni , Stefan Riezler

Simple and effective data augmentation for compositional generalization

Compositional generalization, the ability to predict complex meanings from training on simpler sentences, poses challenges for powerful pretrained seq2seq models. In this paper, we show that data augmentation methods that sample MRs and…

Computation and Language · Computer Science 2024-01-19 Yuekun Yao , Alexander Koller

Data Augmentation for Neural Machine Translation using Generative Language Model

Despite the rapid growth in model architecture, the scarcity of large parallel corpora remains the main bottleneck in Neural Machine Translation. Data augmentation is a technique that enhances the performance of data-hungry models by…

Computation and Language · Computer Science 2023-11-14 Seokjin Oh , Su Ah Lee , Woohwan Jung

Sequence to Sequence Mixture Model for Diverse Machine Translation

Sequence to sequence (SEQ2SEQ) models often lack diversity in their generated translations. This can be attributed to the limitation of SEQ2SEQ models in capturing lexical and syntactic variations in a parallel corpus resulting from…

Computation and Language · Computer Science 2018-10-18 Xuanli He , Gholamreza Haffari , Mohammad Norouzi

Transformers as Neural Augmentors: Class Conditional Sentence Generation via Variational Bayes

Data augmentation methods for Natural Language Processing tasks are explored in recent years, however they are limited and it is hard to capture the diversity on sentence level. Besides, it is not always possible to perform data…

Computation and Language · Computer Science 2022-05-20 M. Şafak Bilici , Mehmet Fatih Amasyali

Learning to Recombine and Resample Data for Compositional Generalization

Flexible neural sequence models outperform grammar- and automaton-based counterparts on a variety of tasks. However, neural models perform poorly in settings requiring compositional generalization beyond the training data -- particularly to…

Computation and Language · Computer Science 2021-06-09 Ekin Akyürek , Afra Feyza Akyürek , Jacob Andreas

Exploring Token-Level Augmentation in Vision Transformer for Semi-Supervised Semantic Segmentation

Semi-supervised semantic segmentation has witnessed remarkable advancements in recent years. However, existing algorithms are based on convolutional neural networks and directly applying them to Vision Transformers poses certain limitations…

Computer Vision and Pattern Recognition · Computer Science 2025-03-11 Dengke Zhang , Quan Tang , Fagui Liu , Haiqing Mei , C. L. Philip Chen