Related papers: Data Augmentation using Large Language Models: Dat…

Text Data Augmentation for Large Language Models: A Comprehensive Survey of Methods, Challenges, and Opportunities

The increasing size and complexity of pre-trained language models have demonstrated superior performance in many applications, but they usually require large training datasets to be adequately trained. Insufficient training sets could…

Computation and Language · Computer Science 2025-02-03 Yaping Chai , Haoran Xie , Joe S. Qin

Data Augmentation Approaches in Natural Language Processing: A Survey

As an effective strategy, data augmentation (DA) alleviates data scarcity scenarios where deep learning techniques may fail. It is widely applied in computer vision then introduced to natural language processing and achieves improvements in…

Computation and Language · Computer Science 2022-06-28 Bohan Li , Yutai Hou , Wanxiang Che

A Survey on Data Augmentation in Large Model Era

Large models, encompassing large language and diffusion models, have shown exceptional promise in approximating human-level intelligence, garnering significant interest from both academic and industrial spheres. However, the training of…

Machine Learning · Computer Science 2024-03-05 Yue Zhou , Chenlu Guo , Xu Wang , Yi Chang , Yuan Wu

Diversity-oriented Data Augmentation with Large Language Models

Data augmentation is an essential technique in natural language processing (NLP) for enriching training datasets by generating diverse samples. This process is crucial for improving the robustness and generalization capabilities of NLP…

Computation and Language · Computer Science 2025-10-16 Zaitian Wang , Jinghan Zhang , Xinhao Zhang , Kunpeng Liu , Pengfei Wang , Yuanchun Zhou

Multimodal Large Language Models for Image, Text, and Speech Data Augmentation: A Survey

In the past five years, research has shifted from traditional Machine Learning (ML) and Deep Learning (DL) approaches to leveraging Large Language Models (LLMs) , including multimodality, for data augmentation to enhance generalization, and…

Computer Vision and Pattern Recognition · Computer Science 2025-03-25 Ranjan Sapkota , Shaina Raza , Maged Shoman , Achyut Paudel , Manoj Karkee

LLM-DA: Data Augmentation via Large Language Models for Few-Shot Named Entity Recognition

Despite the impressive capabilities of large language models (LLMs), their performance on information extraction tasks is still not entirely satisfactory. However, their remarkable rewriting capabilities and extensive world knowledge offer…

Computation and Language · Computer Science 2024-02-23 Junjie Ye , Nuo Xu , Yikun Wang , Jie Zhou , Qi Zhang , Tao Gui , Xuanjing Huang

A Survey on Large Language Model-based Agents for Statistics and Data Science

In recent years, data science agents powered by Large Language Models (LLMs), known as "data agents," have shown significant potential to transform the traditional data analysis paradigm. This survey provides an overview of the evolution,…

Artificial Intelligence · Computer Science 2025-12-01 Maojun Sun , Ruijian Han , Binyan Jiang , Houduo Qi , Defeng Sun , Yancheng Yuan , Jian Huang

Adaptive Augmentation Policy Optimization with LLM Feedback

Data augmentation is a critical component of deep learning pipelines, enhancing model generalization by increasing dataset diversity. Traditional augmentation strategies rely on manually designed transformations, stochastic sampling, or…

Computer Vision and Pattern Recognition · Computer Science 2025-08-06 Ant Duru , Alptekin Temizel

Rethink the Effectiveness of Text Data Augmentation: An Empirical Analysis

In recent years, language models (LMs) have made remarkable progress in advancing the field of natural language processing (NLP). However, the impact of data augmentation (DA) techniques on the fine-tuning (FT) performance of these LMs has…

Computation and Language · Computer Science 2023-06-14 Zhengxiang Shi , Aldo Lipani

The Effects of Data Augmentation on Confidence Estimation for LLMs

Confidence estimation is crucial for reflecting the reliability of large language models (LLMs), particularly in the widely used closed-source models. Utilizing data augmentation for confidence estimation is viable, but discussions focus on…

Machine Learning · Computer Science 2025-06-16 Rui Wang , Renyu Zhu , Minmin Lin , Runze Wu , Tangjie Lv , Changjie Fan , Haobo Wang

A Survey on Data Synthesis and Augmentation for Large Language Models

The success of Large Language Models (LLMs) is inherently linked to the availability of vast, diverse, and high-quality data for training and evaluation. However, the growth rate of high-quality data is significantly outpaced by the…

Computation and Language · Computer Science 2024-10-18 Ke Wang , Jiahui Zhu , Minjie Ren , Zeming Liu , Shiwei Li , Zongye Zhang , Chenkai Zhang , Xiaoyu Wu , Qiqi Zhan , Qingjie Liu , Yunhong Wang

Empowering Large Language Models for Textual Data Augmentation

With the capabilities of understanding and executing natural language instructions, Large language models (LLMs) can potentially act as a powerful tool for textual data augmentation. However, the quality of augmented data depends heavily on…

Computation and Language · Computer Science 2024-04-30 Yichuan Li , Kaize Ding , Jianling Wang , Kyumin Lee

Learnings from Data Integration for Augmented Language Models

One of the limitations of large language models is that they do not have access to up-to-date, proprietary or personal data. As a result, there are multiple efforts to extend language models with techniques for accessing external data. In…

Computation and Language · Computer Science 2023-04-11 Alon Halevy , Jane Dwivedi-Yu

Efficient Strategy for Improving Large Language Model (LLM) Capabilities

Large Language Models (LLMs) have become a milestone in the field of artificial intelligence and natural language processing. However, their large-scale deployment remains constrained by the need for significant computational resources.…

Computation and Language · Computer Science 2025-08-07 Julián Camilo Velandia Gutiérrez

Controllable and Diverse Data Augmentation with Large Language Model for Low-Resource Open-Domain Dialogue Generation

Data augmentation (DA) is crucial to mitigate model training instability and over-fitting problems in low-resource open-domain dialogue generation. However, traditional DA methods often neglect semantic data diversity, restricting the…

Computation and Language · Computer Science 2024-04-02 Zhenhua Liu , Tong Zhu , Jianxiang Xiang , Wenliang Chen

Data Augmentation for Deep Learning Regression Tasks by Machine Learning Models

Deep learning (DL) models have gained prominence in domains such as computer vision and natural language processing but remain underutilized for regression tasks involving tabular data. In these cases, traditional machine learning (ML)…

Machine Learning · Computer Science 2025-01-08 Assaf Shmuel , Oren Glickman , Teddy Lazebnik

Large Language Models for Market Research: A Data-augmentation Approach

Large Language Models (LLMs) have transformed artificial intelligence by excelling in complex natural language processing tasks. Their ability to generate human-like text has opened new possibilities for market research, particularly in…

Artificial Intelligence · Computer Science 2026-04-20 Mengxin Wang , Dennis J. Zhang , Heng Zhang

Data Management For Training Large Language Models: A Survey

Data plays a fundamental role in training Large Language Models (LLMs). Efficient data management, particularly in formulating a well-suited training dataset, is significant for enhancing model performance and improving training efficiency…

Computation and Language · Computer Science 2024-08-05 Zige Wang , Wanjun Zhong , Yufei Wang , Qi Zhu , Fei Mi , Baojun Wang , Lifeng Shang , Xin Jiang , Qun Liu

Data Augmentation for Text-based Person Retrieval Using Large Language Models

Text-based Person Retrieval (TPR) aims to retrieve person images that match the description given a text query. The performance improvement of the TPR model relies on high-quality data for supervised training. However, it is difficult to…

Computer Vision and Pattern Recognition · Computer Science 2024-05-21 Zheng Li , Lijia Si , Caili Guo , Yang Yang , Qiushi Cao

Large Language Model for Qualitative Research -- A Systematic Mapping Study

The exponential growth of text-based data in domains such as healthcare, education, and social sciences has outpaced the capacity of traditional qualitative analysis methods, which are time-intensive and prone to subjectivity. Large…

Computation and Language · Computer Science 2025-03-10 Cauã Ferreira Barros , Bruna Borges Azevedo , Valdemar Vicente Graciano Neto , Mohamad Kassab , Marcos Kalinowski , Hugo Alexandre D. do Nascimento , Michelle C. G. S. P. Bandeira