Related papers: Good-Enough Compositional Data Augmentation

Sequence-Level Mixed Sample Data Augmentation

Despite their empirical success, neural networks still have difficulty capturing compositional aspects of natural language. This work proposes a simple data augmentation approach to encourage compositional behavior in neural models for…

Computation and Language · Computer Science 2020-11-19 Demi Guo , Yoon Kim , Alexander M. Rush

Improving Grammatical Error Correction via Contextual Data Augmentation

Nowadays, data augmentation through synthetic data has been widely used in the field of Grammatical Error Correction (GEC) to alleviate the problem of data scarcity. However, these synthetic data are mainly used in the pre-training phase…

Computation and Language · Computer Science 2024-06-26 Yixuan Wang , Baoxin Wang , Yijun Liu , Qingfu Zhu , Dayong Wu , Wanxiang Che

Learning to Substitute Spans towards Improving Compositional Generalization

Despite the rising prevalence of neural sequence models, recent empirical evidences suggest their deficiency in compositional generalization. One of the current de-facto solutions to this problem is compositional data augmentation, aiming…

Computation and Language · Computer Science 2023-06-06 Zhaoyi Li , Ying Wei , Defu Lian

Compositionality as Lexical Symmetry

In tasks like semantic parsing, instruction following, and question answering, standard deep networks fail to generalize compositionally from small datasets. Many existing approaches overcome this limitation with model architectures that…

Computation and Language · Computer Science 2023-07-06 Ekin Akyürek , Jacob Andreas

Filtering with Confidence: When Data Augmentation Meets Conformal Prediction

With promising empirical performance across a wide range of applications, synthetic data augmentation appears a viable solution to data scarcity and the demands of increasingly data-intensive models. Its effectiveness lies in expanding the…

Machine Learning · Computer Science 2026-02-02 Zixuan Wu , So Won Jeong , Yating Liu , Yeo Jin Jung , Claire Donnat

Learning to Substitute Components for Compositional Generalization

Despite the rising prevalence of neural language models, recent empirical evidence suggests their deficiency in compositional generalization. One of the current de-facto solutions to this problem is compositional data augmentation, which…

Computation and Language · Computer Science 2025-03-03 Zhaoyi Li , Gangwei Jiang , Chenwang Wu , Ying Wei , Defu Lian , Enhong Chen

Data Augmentation for Compositional Data: Advancing Predictive Models of the Microbiome

Data augmentation plays a key role in modern machine learning pipelines. While numerous augmentation strategies have been studied in the context of computer vision and natural language processing, less is known for other data modalities.…

Machine Learning · Statistics 2022-05-23 Elliott Gordon-Rodriguez , Thomas P. Quinn , John P. Cunningham

Simple and effective data augmentation for compositional generalization

Compositional generalization, the ability to predict complex meanings from training on simpler sentences, poses challenges for powerful pretrained seq2seq models. In this paper, we show that data augmentation methods that sample MRs and…

Computation and Language · Computer Science 2024-01-19 Yuekun Yao , Alexander Koller

Inducing Transformer's Compositional Generalization Ability via Auxiliary Sequence Prediction Tasks

Systematic compositionality is an essential mechanism in human language, allowing the recombination of known parts to create novel expressions. However, existing neural models have been shown to lack this basic ability in learning symbolic…

Computation and Language · Computer Science 2021-10-01 Yichen Jiang , Mohit Bansal

Improving Conditioning in Context-Aware Sequence to Sequence Models

Neural sequence to sequence models are well established for applications which can be cast as mapping a single input sequence into a single output sequence. In this work, we focus on cases where generation is conditioned on both a short…

Computation and Language · Computer Science 2019-11-25 Xinyi Wang , Jason Weston , Michael Auli , Yacine Jernite

Finding needles in a haystack: Sampling Structurally-diverse Training Sets from Synthetic Data for Compositional Generalization

Modern semantic parsers suffer from two principal limitations. First, training requires expensive collection of utterance-program pairs. Second, semantic parsers fail to generalize at test time to new compositions/structures that have not…

Computation and Language · Computer Science 2021-09-07 Inbar Oren , Jonathan Herzig , Jonathan Berant

SapAugment: Learning A Sample Adaptive Policy for Data Augmentation

Data augmentation methods usually apply the same augmentation (or a mix of them) to all the training samples. For example, to perturb data with noise, the noise is sampled from a Normal distribution with a fixed standard deviation, for all…

Machine Learning · Computer Science 2021-02-16 Ting-Yao Hu , Ashish Shrivastava , Jen-Hao Rick Chang , Hema Koppula , Stefan Braun , Kyuyeon Hwang , Ozlem Kalinli , Oncel Tuzel

Compositional pre-training for neural semantic parsing

Semantic parsing is the process of translating natural language utterances into logical forms, which has many important applications such as question answering and instruction following. Sequence-to-sequence models have been very successful…

Computation and Language · Computer Science 2019-05-29 Amir Ziai

Towards Understanding the Relationship between In-context Learning and Compositional Generalization

According to the principle of compositional generalization, the meaning of a complex expression can be understood as a function of the meaning of its parts and of how they are combined. This principle is crucial for human language…

Computation and Language · Computer Science 2024-03-19 Sungjun Han , Sebastian Padó

Conditional Data Synthesis Augmentation

Reliable machine learning and statistical analysis rely on diverse, well-distributed training data. However, real-world datasets are often limited in size and exhibit underrepresentation across key subpopulations, leading to biased…

Methodology · Statistics 2025-07-15 Xinyu Tian , Xiaotong Shen

Improved Techniques for the Conditional Generative Augmentation of Clinical Audio Data

Data augmentation is a valuable tool for the design of deep learning systems to overcome data limitations and stabilize the training process. Especially in the medical domain, where the collection of large-scale data sets is challenging and…

Machine Learning · Computer Science 2025-02-11 Mane Margaryan , Matthias Seibold , Indu Joshi , Mazda Farshad , Philipp Fürnstahl , Nassir Navab

Automatic Data Augmentation Selection and Parametrization in Contrastive Self-Supervised Speech Representation Learning

Contrastive learning enables learning useful audio and speech representations without ground-truth labels by maximizing the similarity between latent representations of similar signal segments. In this framework various data augmentation…

Audio and Speech Processing · Electrical Eng. & Systems 2022-04-11 Salah Zaiem , Titouan Parcollet , Slim Essid

Instruction Data Generation and Unsupervised Adaptation for Speech Language Models

In this paper, we propose three methods for generating synthetic samples to train and evaluate multimodal large language models capable of processing both text and speech inputs. Addressing the scarcity of samples containing both…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-21 Vahid Noroozi , Zhehuai Chen , Somshubra Majumdar , Steve Huang , Jagadeesh Balam , Boris Ginsburg

Learning to Recombine and Resample Data for Compositional Generalization

Flexible neural sequence models outperform grammar- and automaton-based counterparts on a variety of tasks. However, neural models perform poorly in settings requiring compositional generalization beyond the training data -- particularly to…

Computation and Language · Computer Science 2021-06-09 Ekin Akyürek , Afra Feyza Akyürek , Jacob Andreas

Transformers as Neural Augmentors: Class Conditional Sentence Generation via Variational Bayes

Data augmentation methods for Natural Language Processing tasks are explored in recent years, however they are limited and it is hard to capture the diversity on sentence level. Besides, it is not always possible to perform data…

Computation and Language · Computer Science 2022-05-20 M. Şafak Bilici , Mehmet Fatih Amasyali