Related papers: Diversity-oriented Data Augmentation with Large La…

Data Augmentation using Large Language Models: Data Perspectives, Learning Paradigms and Challenges

In the rapidly evolving field of large language models (LLMs), data augmentation (DA) has emerged as a pivotal technique for enhancing model performance by diversifying training examples without the need for additional data collection. This…

Computation and Language · Computer Science 2024-07-03 Bosheng Ding , Chengwei Qin , Ruochen Zhao , Tianze Luo , Xinze Li , Guizhen Chen , Wenhan Xia , Junjie Hu , Anh Tuan Luu , Shafiq Joty

Data Augmentation Approaches in Natural Language Processing: A Survey

As an effective strategy, data augmentation (DA) alleviates data scarcity scenarios where deep learning techniques may fail. It is widely applied in computer vision then introduced to natural language processing and achieves improvements in…

Computation and Language · Computer Science 2022-06-28 Bohan Li , Yutai Hou , Wanxiang Che

Text Data Augmentation for Large Language Models: A Comprehensive Survey of Methods, Challenges, and Opportunities

The increasing size and complexity of pre-trained language models have demonstrated superior performance in many applications, but they usually require large training datasets to be adequately trained. Insufficient training sets could…

Computation and Language · Computer Science 2025-02-03 Yaping Chai , Haoran Xie , Joe S. Qin

An Empirical Survey of Data Augmentation for Limited Data Learning in NLP

NLP has achieved great progress in the past decade through the use of neural models and large labeled datasets. The dependence on abundant data prevents NLP models from being applied to low-resource settings or novel tasks where significant…

Computation and Language · Computer Science 2021-06-15 Jiaao Chen , Derek Tam , Colin Raffel , Mohit Bansal , Diyi Yang

DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling

In this paper, we present an effective data augmentation framework leveraging the Large Language Model (LLM) and Diffusion Model (DM) to tackle the challenges inherent in data-scarce scenarios. Recently, DMs have opened up the possibility…

Computer Vision and Pattern Recognition · Computer Science 2024-09-26 Kyuheon Jung , Yongdeuk Seo , Seongwoo Cho , Jaeyoung Kim , Hyun-seok Min , Sungchul Choi

Diversity as a Reward: Fine-Tuning LLMs on a Mixture of Domain-Undetermined Data

Fine-tuning large language models (LLMs) using diverse datasets is crucial for enhancing their overall performance across various domains. In practical scenarios, existing methods based on modeling the mixture proportions of data…

Computation and Language · Computer Science 2025-10-31 Zhenqing Ling , Daoyuan Chen , Liuyi Yao , Qianli Shen , Yaliang Li , Ying Shen

A Survey of Data Augmentation Approaches for NLP

Data augmentation has recently seen increased interest in NLP due to more work in low-resource domains, new tasks, and the popularity of large-scale neural networks that require large amounts of training data. Despite this recent upsurge,…

Computation and Language · Computer Science 2021-12-03 Steven Y. Feng , Varun Gangal , Jason Wei , Sarath Chandar , Soroush Vosoughi , Teruko Mitamura , Eduard Hovy

Multimodal Large Language Models for Image, Text, and Speech Data Augmentation: A Survey

In the past five years, research has shifted from traditional Machine Learning (ML) and Deep Learning (DL) approaches to leveraging Large Language Models (LLMs) , including multimodality, for data augmentation to enhance generalization, and…

Computer Vision and Pattern Recognition · Computer Science 2025-03-25 Ranjan Sapkota , Shaina Raza , Maged Shoman , Achyut Paudel , Manoj Karkee

Adaptive Augmentation Policy Optimization with LLM Feedback

Data augmentation is a critical component of deep learning pipelines, enhancing model generalization by increasing dataset diversity. Traditional augmentation strategies rely on manually designed transformations, stochastic sampling, or…

Computer Vision and Pattern Recognition · Computer Science 2025-08-06 Ant Duru , Alptekin Temizel

Empowering Large Language Models for Textual Data Augmentation

With the capabilities of understanding and executing natural language instructions, Large language models (LLMs) can potentially act as a powerful tool for textual data augmentation. However, the quality of augmented data depends heavily on…

Computation and Language · Computer Science 2024-04-30 Yichuan Li , Kaize Ding , Jianling Wang , Kyumin Lee

Rethink the Effectiveness of Text Data Augmentation: An Empirical Analysis

In recent years, language models (LMs) have made remarkable progress in advancing the field of natural language processing (NLP). However, the impact of data augmentation (DA) techniques on the fine-tuning (FT) performance of these LMs has…

Computation and Language · Computer Science 2023-06-14 Zhengxiang Shi , Aldo Lipani

The Effects of Data Augmentation on Confidence Estimation for LLMs

Confidence estimation is crucial for reflecting the reliability of large language models (LLMs), particularly in the widely used closed-source models. Utilizing data augmentation for confidence estimation is viable, but discussions focus on…

Machine Learning · Computer Science 2025-06-16 Rui Wang , Renyu Zhu , Minmin Lin , Runze Wu , Tangjie Lv , Changjie Fan , Haobo Wang

Data Augmentation for Neural NLP

Data scarcity is a problem that occurs in languages and tasks where we do not have large amounts of labeled data but want to use state-of-the-art models. Such models are often deep learning models that require a significant amount of data…

Computation and Language · Computer Science 2023-02-23 Domagoj Pluščec , Jan Šnajder

Controllable and Diverse Data Augmentation with Large Language Model for Low-Resource Open-Domain Dialogue Generation

Data augmentation (DA) is crucial to mitigate model training instability and over-fitting problems in low-resource open-domain dialogue generation. However, traditional DA methods often neglect semantic data diversity, restricting the…

Computation and Language · Computer Science 2024-04-02 Zhenhua Liu , Tong Zhu , Jianxiang Xiang , Wenliang Chen

A Survey on Data Augmentation in Large Model Era

Large models, encompassing large language and diffusion models, have shown exceptional promise in approximating human-level intelligence, garnering significant interest from both academic and industrial spheres. However, the training of…

Machine Learning · Computer Science 2024-03-05 Yue Zhou , Chenlu Guo , Xu Wang , Yi Chang , Yuan Wu

Improving Multilingual Instruction Finetuning via Linguistically Natural and Diverse Datasets

Advancements in Large Language Models (LLMs) have significantly enhanced instruction-following capabilities. However, most Instruction Fine-Tuning (IFT) datasets are predominantly in English, limiting model performance in other languages.…

Computation and Language · Computer Science 2024-07-03 Sathish Reddy Indurthi , Wenxuan Zhou , Shamil Chollampatt , Ravi Agrawal , Kaiqiang Song , Lingxiao Zhao , Chenguang Zhu

Iterative Augmentation with Summarization Refinement (IASR) Evaluation for Unstructured Survey data Modeling and Analysis

Text data augmentation is a widely used strategy for mitigating data sparsity in natural language processing (NLP), particularly in low-resource settings where limited samples hinder effective semantic modeling. While augmentation can…

Computation and Language · Computer Science 2025-07-17 Payal Bhattad , Sai Manoj Pudukotai Dinakarrao , Anju Gupta

AugGPT: Leveraging ChatGPT for Text Data Augmentation

Text data augmentation is an effective strategy for overcoming the challenge of limited sample sizes in many natural language processing (NLP) tasks. This challenge is especially prominent in the few-shot learning scenario, where the data…

Computation and Language · Computer Science 2023-03-21 Haixing Dai , Zhengliang Liu , Wenxiong Liao , Xiaoke Huang , Yihan Cao , Zihao Wu , Lin Zhao , Shaochen Xu , Wei Liu , Ninghao Liu , Sheng Li , Dajiang Zhu , Hongmin Cai , Lichao Sun , Quanzheng Li , Dinggang Shen , Tianming Liu , Xiang Li

Increasing Diversity While Maintaining Accuracy: Text Data Generation with Large Language Models and Human Interventions

Large language models (LLMs) can be used to generate text data for training and evaluating other models. However, creating high-quality datasets with LLMs can be challenging. In this work, we explore human-AI partnerships to facilitate high…

Computation and Language · Computer Science 2023-08-11 John Joon Young Chung , Ece Kamar , Saleema Amershi

Is augmentation effective to improve prediction in imbalanced text datasets?

Imbalanced datasets present a significant challenge for machine learning models, often leading to biased predictions. To address this issue, data augmentation techniques are widely used in natural language processing (NLP) to generate new…

Computation and Language · Computer Science 2023-04-21 Gabriel O. Assunção , Rafael Izbicki , Marcos O. Prates