Related papers: BDA: Bangla Text Data Augmentation Framework

Data Boost: Text Data Augmentation Through Reinforcement Learning Guided Conditional Generation

Data augmentation is proven to be effective in many NLU tasks, especially for those suffering from data scarcity. In this paper, we present a powerful and easy to deploy text augmentation framework, Data Boost, which augments data through…

Computation and Language · Computer Science 2020-12-08 Ruibo Liu , Guangxuan Xu , Chenyan Jia , Weicheng Ma , Lili Wang , Soroush Vosoughi

Addressing Data Scarcity in Bangla Fake News Detection: An LLM-Based Dataset Augmentation Approach

The growing spread of misinformation in digital media highlights the need for reliable fake news detection systems, yet progress in under-resourced languages such as Bangla is limited by small and imbalanced datasets. This study…

Computation and Language · Computer Science 2026-05-05 Ahmed Alfey Sani , Kazi Akib Zaoad , Shefayat E Shams Adib , Md Abdul Muqtadir , Ajwad Abrar

Data Augmentation Approaches in Natural Language Processing: A Survey

As an effective strategy, data augmentation (DA) alleviates data scarcity scenarios where deep learning techniques may fail. It is widely applied in computer vision then introduced to natural language processing and achieves improvements in…

Computation and Language · Computer Science 2022-06-28 Bohan Li , Yutai Hou , Wanxiang Che

IndiText Boost: Text Augmentation for Low Resource India Languages

Text Augmentation is an important task for low-resource languages. It helps deal with the problem of data scarcity. A data augmentation strategy is used to deal with the problem of data scarcity. Through the years, much work has been done…

Computation and Language · Computer Science 2024-01-25 Onkar Litake , Niraj Yagnik , Shreyas Labhsetwar

Data Augmentation for Traffic Classification

Data Augmentation (DA) -- enriching training data by adding synthetic samples -- is a technique widely adopted in Computer Vision (CV) and Natural Language Processing (NLP) tasks to improve models performance. Yet, DA has struggled to gain…

Machine Learning · Computer Science 2024-01-24 Chao Wang , Alessandro Finamore , Pietro Michiardi , Massimo Gallo , Dario Rossi

Not Enough Data? Deep Learning to the Rescue!

Based on recent advances in natural language modeling and those in text generation capabilities, we propose a novel data augmentation method for text classification tasks. We use a powerful pre-trained neural network model to artificially…

Computation and Language · Computer Science 2019-11-28 Ateret Anaby-Tavor , Boaz Carmeli , Esther Goldbraich , Amir Kantor , George Kour , Segev Shlomov , Naama Tepper , Naama Zwerdling

SDA: Improving Text Generation with Self Data Augmentation

Data augmentation has been widely used to improve deep neural networks in many research fields, such as computer vision. However, less work has been done in the context of text, partially due to its discrete nature and the complexity of…

Computation and Language · Computer Science 2021-01-12 Ping Yu , Ruiyi Zhang , Yang Zhao , Yizhe Zhang , Chunyuan Li , Changyou Chen

Distributional Data Augmentation Methods for Low Resource Language

Text augmentation is a technique for constructing synthetic data from an under-resourced corpus to improve predictive performance. Synthetic data generation is common in numerous domains. However, recently text augmentation has emerged in…

Computation and Language · Computer Science 2023-09-12 Mosleh Mahamud , Zed Lee , Isak Samsten

Rethinking Data Augmentation for Low-Resource Neural Machine Translation: A Multi-Task Learning Approach

In the context of neural machine translation, data augmentation (DA) techniques may be used for generating additional training samples when the available parallel data are scarce. Many DA approaches aim at expanding the support of the…

Computation and Language · Computer Science 2021-09-09 Víctor M. Sánchez-Cartagena , Miquel Esplà-Gomis , Juan Antonio Pérez-Ortiz , Felipe Sánchez-Martínez

Rethink the Effectiveness of Text Data Augmentation: An Empirical Analysis

In recent years, language models (LMs) have made remarkable progress in advancing the field of natural language processing (NLP). However, the impact of data augmentation (DA) techniques on the fine-tuning (FT) performance of these LMs has…

Computation and Language · Computer Science 2023-06-14 Zhengxiang Shi , Aldo Lipani

Data Augmentation using Transformers and Similarity Measures for Improving Arabic Text Classification

The performance of learning models heavily relies on the availability and adequacy of training data. To address the dataset adequacy issue, researchers have extensively explored data augmentation (DA) as a promising approach. DA generates…

Computation and Language · Computer Science 2023-08-22 Dania Refai , Saleh Abo-Soud , Mohammad Abdel-Rahman

Data Augmentation in Natural Language Processing: A Novel Text Generation Approach for Long and Short Text Classifiers

In many cases of machine learning, research suggests that the development of training data might have a higher relevance than the choice and modelling of classifiers themselves. Thus, data augmentation methods have been developed to improve…

Computation and Language · Computer Science 2022-07-25 Markus Bayer , Marc-André Kaufhold , Björn Buchhold , Marcel Keller , Jörg Dallmeyer , Christian Reuter

On Evaluation Protocols for Data Augmentation in a Limited Data Scenario

Textual data augmentation (DA) is a prolific field of study where novel techniques to create artificial data are regularly proposed, and that has demonstrated great efficiency on small data settings, at least for text classification tasks.…

Computation and Language · Computer Science 2024-09-18 Frédéric Piedboeuf , Philippe Langlais

Leveraging Data Augmentation for Process Information Extraction

Business Process Modeling projects often require formal process models as a central component. High costs associated with the creation of such formal process models motivated many different fields of research aimed at automated generation…

Computation and Language · Computer Science 2024-04-12 Julian Neuberger , Leonie Doll , Benedict Engelmann , Lars Ackermann , Stefan Jablonski

A Survey on Data Augmentation for Text Classification

Data augmentation, the artificial creation of training data for machine learning by transformations, is a widely studied research field across machine learning disciplines. While it is useful for increasing a model's generalization…

Computation and Language · Computer Science 2022-09-09 Markus Bayer , Marc-André Kaufhold , Christian Reuter

FlipDA: Effective and Robust Data Augmentation for Few-Shot Learning

Most previous methods for text data augmentation are limited to simple tasks and weak baselines. We explore data augmentation on hard tasks (i.e., few-shot natural language understanding) and strong baselines (i.e., pretrained models with…

Computation and Language · Computer Science 2022-03-16 Jing Zhou , Yanan Zheng , Jie Tang , Jian Li , Zhilin Yang

A Unified Framework for Generative Data Augmentation: A Comprehensive Survey

Generative data augmentation (GDA) has emerged as a promising technique to alleviate data scarcity in machine learning applications. This thesis presents a comprehensive survey and unified framework of the GDA landscape. We first provide an…

Machine Learning · Computer Science 2024-04-23 Yunhao Chen , Zihui Yan , Yunjie Zhu

From Scarcity to Efficiency: Investigating the Effects of Data Augmentation on African Machine Translation

The linguistic diversity across the African continent presents different challenges and opportunities for machine translation. This study explores the effects of data augmentation techniques in improving translation systems in low-resource…

Computation and Language · Computer Science 2025-10-21 Mardiyyah Oduwole , Oluwatosin Olajide , Jamiu Suleiman , Faith Hunja , Busayo Awobade , Fatimo Adebanjo , Comfort Akanni , Chinonyelum Igwe , Peace Ododo , Promise Omoigui , Abraham Owodunni , Steven Kolawole

BSDA: Bayesian Random Semantic Data Augmentation for Medical Image Classification

Data augmentation is a crucial regularization technique for deep neural networks, particularly in medical image classification. Mainstream data augmentation (DA) methods are usually applied at the image level. Due to the specificity and…

Computer Vision and Pattern Recognition · Computer Science 2024-06-28 Yaoyao Zhu , Xiuding Cai , Xueyao Wang , Xiaoqing Chen , Yu Yao , Zhongliang Fu

Diversity-oriented Data Augmentation with Large Language Models

Data augmentation is an essential technique in natural language processing (NLP) for enriching training datasets by generating diverse samples. This process is crucial for improving the robustness and generalization capabilities of NLP…

Computation and Language · Computer Science 2025-10-16 Zaitian Wang , Jinghan Zhang , Xinhao Zhang , Kunpeng Liu , Pengfei Wang , Yuanchun Zhou