Related papers: Controllable Data Augmentation for Context-Depende…

QDA-SQL: Questions Enhanced Dialogue Augmentation for Multi-Turn Text-to-SQL

Fine-tuning large language models (LLMs) for specific domain tasks has achieved great success in Text-to-SQL tasks. However, these fine-tuned models often face challenges with multi-turn Text-to-SQL tasks caused by ambiguous or unanswerable…

Artificial Intelligence · Computer Science 2024-11-12 Yinggang Sun , Ziming Guo , Haining Yu , Chuanyi Liu , Xiang Li , Bingxuan Wang , Xiangzhan Yu , Tiancheng Zhao

Data Augmentation with Hierarchical SQL-to-Question Generation for Cross-domain Text-to-SQL Parsing

Data augmentation has attracted a lot of research attention in the deep learning era for its ability in alleviating data sparseness. The lack of labeled data for unseen evaluation databases is exactly the major challenge for cross-domain…

Computation and Language · Computer Science 2022-11-16 Kun Wu , Lijie Wang , Zhenghua Li , Ao Zhang , Xinyan Xiao , Hua Wu , Min Zhang , Haifeng Wang

Controllable and Diverse Data Augmentation with Large Language Model for Low-Resource Open-Domain Dialogue Generation

Data augmentation (DA) is crucial to mitigate model training instability and over-fitting problems in low-resource open-domain dialogue generation. However, traditional DA methods often neglect semantic data diversity, restricting the…

Computation and Language · Computer Science 2024-04-02 Zhenhua Liu , Tong Zhu , Jianxiang Xiang , Wenliang Chen

Data Augmentation for Conversational AI

Advancements in conversational systems have revolutionized information access, surpassing the limitations of single queries. However, developing dialogue systems requires a large amount of training data, which is a challenge in low-resource…

Computation and Language · Computer Science 2024-03-05 Heydar Soudani , Evangelos Kanoulas , Faegheh Hasibi

Context-Aware Counterfactual Data Augmentation for Gender Bias Mitigation in Language Models

A challenge in mitigating social bias in fine-tuned language models (LMs) is the potential reduction in language modeling capability, which can harm downstream performance. Counterfactual data augmentation (CDA), a widely used method for…

Computation and Language · Computer Science 2026-02-11 Shweta Parihar , Liu Guangliang , Natalie Parde , Lu Cheng

TOD-DA: Towards Boosting the Robustness of Task-oriented Dialogue Modeling on Spoken Conversations

Task-oriented dialogue systems have been plagued by the difficulties of obtaining large-scale and high-quality annotated conversations. Furthermore, most of the publicly available datasets only include written conversations, which are…

Computation and Language · Computer Science 2021-12-24 Xin Tian , Xinxian Huang , Dongfeng He , Yingzhan Lin , Siqi Bao , Huang He , Liankai Huang , Qiang Ju , Xiyuan Zhang , Jian Xie , Shuqi Sun , Fan Wang , Hua Wu , Haifeng Wang

CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for Natural Language Understanding

Data augmentation has been demonstrated as an effective strategy for improving model generalization and data efficiency. However, due to the discrete nature of natural language, designing label-preserving transformations for text data tends…

Computation and Language · Computer Science 2020-10-20 Yanru Qu , Dinghan Shen , Yelong Shen , Sandra Sajeev , Jiawei Han , Weizhu Chen

Leveraging QA Datasets to Improve Generative Data Augmentation

The ability of generative language models (GLMs) to generate text has improved considerably in the last few years, enabling their use for generative data augmentation. In this work, we propose CONDA, an approach to further improve GLMs'…

Computation and Language · Computer Science 2022-10-26 Dheeraj Mekala , Tu Vu , Timo Schick , Jingbo Shang

Data Augmentation Approaches in Natural Language Processing: A Survey

As an effective strategy, data augmentation (DA) alleviates data scarcity scenarios where deep learning techniques may fail. It is widely applied in computer vision then introduced to natural language processing and achieves improvements in…

Computation and Language · Computer Science 2022-06-28 Bohan Li , Yutai Hou , Wanxiang Che

CoDa: Constrained Generation based Data Augmentation for Low-Resource NLP

We present CoDa (Constrained Generation based Data Augmentation), a controllable, effective, and training-free data augmentation technique for low-resource (data-scarce) NLP. Our approach is based on prompting off-the-shelf…

Computation and Language · Computer Science 2024-04-02 Chandra Kiran Reddy Evuru , Sreyan Ghosh , Sonal Kumar , Ramaneswaran S , Utkarsh Tyagi , Dinesh Manocha

Improving Grammatical Error Correction via Contextual Data Augmentation

Nowadays, data augmentation through synthetic data has been widely used in the field of Grammatical Error Correction (GEC) to alleviate the problem of data scarcity. However, these synthetic data are mainly used in the pre-training phase…

Computation and Language · Computer Science 2024-06-26 Yixuan Wang , Baoxin Wang , Yijun Liu , Qingfu Zhu , Dayong Wu , Wanxiang Che

CoUDA: Coherence Evaluation via Unified Data Augmentation

Coherence evaluation aims to assess the organization and structure of a discourse, which remains challenging even in the era of large language models. Due to the scarcity of annotated data, data augmentation is commonly used for training…

Computation and Language · Computer Science 2024-04-02 Dawei Zhu , Wenhao Wu , Yifan Song , Fangwei Zhu , Ziqiang Cao , Sujian Li

Distributional Data Augmentation Methods for Low Resource Language

Text augmentation is a technique for constructing synthetic data from an under-resourced corpus to improve predictive performance. Synthetic data generation is common in numerous domains. However, recently text augmentation has emerged in…

Computation and Language · Computer Science 2023-09-12 Mosleh Mahamud , Zed Lee , Isak Samsten

Contextual Augmentation: Data Augmentation by Words with Paradigmatic Relations

We propose a novel data augmentation for labeled sentences called contextual augmentation. We assume an invariance that sentences are natural even if the words in the sentences are replaced with other words with paradigmatic relations. We…

Computation and Language · Computer Science 2018-05-17 Sosuke Kobayashi

Rethinking Data Augmentation for Robust Visual Question Answering

Data Augmentation (DA) -- generating extra training samples beyond original training set -- has been widely-used in today's unbiased VQA models to mitigate the language biases. Current mainstream DA strategies are synthetic-based methods,…

Computer Vision and Pattern Recognition · Computer Science 2022-09-16 Long Chen , Yuhang Zheng , Jun Xiao

Conditional Data Synthesis Augmentation

Reliable machine learning and statistical analysis rely on diverse, well-distributed training data. However, real-world datasets are often limited in size and exhibit underrepresentation across key subpopulations, leading to biased…

Methodology · Statistics 2025-07-15 Xinyu Tian , Xiaotong Shen

Contextual Data Augmentation for Task-Oriented Dialog Systems

Collection of annotated dialogs for training task-oriented dialog systems have been one of the key bottlenecks in improving current models. While dialog response generation has been widely studied on the agent side, it is not evident if…

Computation and Language · Computer Science 2023-10-17 Dustin Axman , Avik Ray , Shubham Garg , Jing Huang

Learning towards Selective Data Augmentation for Dialogue Generation

As it is cumbersome and expensive to acquire a huge amount of data for training neural dialog models, data augmentation is proposed to effectively utilize existing training samples. However, current data augmentation techniques on the…

Computation and Language · Computer Science 2023-03-20 Xiuying Chen , Mingzhe Li , Jiayi Zhang , Xiaoqiang Xia , Chen Wei , Jianwei Cui , Xin Gao , Xiangliang Zhang , Rui Yan

SDA: Improving Text Generation with Self Data Augmentation

Data augmentation has been widely used to improve deep neural networks in many research fields, such as computer vision. However, less work has been done in the context of text, partially due to its discrete nature and the complexity of…

Computation and Language · Computer Science 2021-01-12 Ping Yu , Ruiyi Zhang , Yang Zhao , Yizhe Zhang , Chunyuan Li , Changyou Chen

Context Decoupling Augmentation for Weakly Supervised Semantic Segmentation

Data augmentation is vital for deep learning neural networks. By providing massive training samples, it helps to improve the generalization ability of the model. Weakly supervised semantic segmentation (WSSS) is a challenging problem that…

Computer Vision and Pattern Recognition · Computer Science 2021-10-29 Yukun Su , Ruizhou Sun , Guosheng Lin , Qingyao Wu