Related papers: Data Augmentation for Text Generation Without Any …

Data Augmentation in Natural Language Processing: A Novel Text Generation Approach for Long and Short Text Classifiers

In many cases of machine learning, research suggests that the development of training data might have a higher relevance than the choice and modelling of classifiers themselves. Thus, data augmentation methods have been developed to improve…

Computation and Language · Computer Science 2022-07-25 Markus Bayer , Marc-André Kaufhold , Björn Buchhold , Marcel Keller , Jörg Dallmeyer , Christian Reuter

Leveraging Data Augmentation for Process Information Extraction

Business Process Modeling projects often require formal process models as a central component. High costs associated with the creation of such formal process models motivated many different fields of research aimed at automated generation…

Computation and Language · Computer Science 2024-04-12 Julian Neuberger , Leonie Doll , Benedict Engelmann , Lars Ackermann , Stefan Jablonski

A Survey on Data Augmentation for Text Classification

Data augmentation, the artificial creation of training data for machine learning by transformations, is a widely studied research field across machine learning disciplines. While it is useful for increasing a model's generalization…

Computation and Language · Computer Science 2022-09-09 Markus Bayer , Marc-André Kaufhold , Christian Reuter

SDA: Improving Text Generation with Self Data Augmentation

Data augmentation has been widely used to improve deep neural networks in many research fields, such as computer vision. However, less work has been done in the context of text, partially due to its discrete nature and the complexity of…

Computation and Language · Computer Science 2021-01-12 Ping Yu , Ruiyi Zhang , Yang Zhao , Yizhe Zhang , Chunyuan Li , Changyou Chen

Data Boost: Text Data Augmentation Through Reinforcement Learning Guided Conditional Generation

Data augmentation is proven to be effective in many NLU tasks, especially for those suffering from data scarcity. In this paper, we present a powerful and easy to deploy text augmentation framework, Data Boost, which augments data through…

Computation and Language · Computer Science 2020-12-08 Ruibo Liu , Guangxuan Xu , Chenyan Jia , Weicheng Ma , Lili Wang , Soroush Vosoughi

Text Augmentation in a Multi-Task View

Traditional data augmentation aims to increase the coverage of the input distribution by generating augmented examples that strongly resemble original samples in an online fashion where augmented examples dominate training. In this paper,…

Computation and Language · Computer Science 2021-01-15 Jason Wei , Chengyu Huang , Shiqi Xu , Soroush Vosoughi

Exploring Data Augmentation for Code Generation Tasks

Advances in natural language processing, such as transfer learning from pre-trained language models, have impacted how models are trained for programming language tasks too. Previous research primarily explored code pre-training and…

Computation and Language · Computer Science 2023-02-08 Pinzhen Chen , Gerasimos Lampouras

AugGPT: Leveraging ChatGPT for Text Data Augmentation

Text data augmentation is an effective strategy for overcoming the challenge of limited sample sizes in many natural language processing (NLP) tasks. This challenge is especially prominent in the few-shot learning scenario, where the data…

Computation and Language · Computer Science 2023-03-21 Haixing Dai , Zhengliang Liu , Wenxiong Liao , Xiaoke Huang , Yihan Cao , Zihao Wu , Lin Zhao , Shaochen Xu , Wei Liu , Ninghao Liu , Sheng Li , Dajiang Zhu , Hongmin Cai , Lichao Sun , Quanzheng Li , Dinggang Shen , Tianming Liu , Xiang Li

Make More of Your Data: Minimal Effort Data Augmentation for Automatic Speech Recognition and Translation

Data augmentation is a technique to generate new training data based on existing data. We evaluate the simple and cost-effective method of concatenating the original data examples to build new training instances. Continued training with…

Computation and Language · Computer Science 2023-06-12 Tsz Kin Lam , Shigehiko Schamoni , Stefan Riezler

Neural Data-to-Text Generation with LM-based Text Augmentation

For many new application domains for data-to-text generation, the main obstacle in training neural models consists of a lack of training data. While usually large numbers of instances are available on the data side, often only very few text…

Computation and Language · Computer Science 2021-02-09 Ernie Chang , Xiaoyu Shen , Dawei Zhu , Vera Demberg , Hui Su

Text Data Augmentation for Large Language Models: A Comprehensive Survey of Methods, Challenges, and Opportunities

The increasing size and complexity of pre-trained language models have demonstrated superior performance in many applications, but they usually require large training datasets to be adequately trained. Insufficient training sets could…

Computation and Language · Computer Science 2025-02-03 Yaping Chai , Haoran Xie , Joe S. Qin

Boosting Source Code Learning with Text-Oriented Data Augmentation: An Empirical Study

Recent studies have demonstrated remarkable advancements in source code learning, which applies deep neural networks (DNNs) to tackle various software engineering tasks. Similar to other DNN-based domains, source code learning also requires…

Software Engineering · Computer Science 2025-02-07 Zeming Dong , Qiang Hu , Yuejun Guo , Zhenya Zhang , Maxime Cordy , Mike Papadakis , Yves Le Traon , Jianjun Zhao

Reweighting Augmented Samples by Minimizing the Maximal Expected Loss

Data augmentation is an effective technique to improve the generalization of deep neural networks. However, previous data augmentation methods usually treat the augmented samples equally without considering their individual impacts on the…

Machine Learning · Computer Science 2021-03-17 Mingyang Yi , Lu Hou , Lifeng Shang , Xin Jiang , Qun Liu , Zhi-Ming Ma

Data Augmentation Revisited: Rethinking the Distribution Gap between Clean and Augmented Data

Data augmentation has been widely applied as an effective methodology to improve generalization in particular when training deep neural networks. Recently, researchers proposed a few intensive data augmentation techniques, which indeed…

Machine Learning · Computer Science 2019-11-22 Zhuoxun He , Lingxi Xie , Xin Chen , Ya Zhang , Yanfeng Wang , Qi Tian

Text Generation with Speech Synthesis for ASR Data Augmentation

Aiming at reducing the reliance on expensive human annotations, data synthesis for Automatic Speech Recognition (ASR) has remained an active area of research. While prior work mainly focuses on synthetic speech generation for ASR data…

Computation and Language · Computer Science 2023-05-29 Zhuangqun Huang , Gil Keren , Ziran Jiang , Shashank Jain , David Goss-Grubbs , Nelson Cheng , Farnaz Abtahi , Duc Le , David Zhang , Antony D'Avirro , Ethan Campbell-Taylor , Jessie Salas , Irina-Elena Veliche , Xi Chen

Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition

Handwritten text and scene text suffer from various shapes and distorted patterns. Thus training a robust recognition model requires a large amount of data to cover diversity as much as possible. In contrast to data collection and…

Computer Vision and Pattern Recognition · Computer Science 2020-03-17 Canjie Luo , Yuanzhi Zhu , Lianwen Jin , Yongpan Wang

Learning to Compose Domain-Specific Transformations for Data Augmentation

Data augmentation is a ubiquitous technique for increasing the size of labeled training sets by leveraging task-specific data transformations that preserve class labels. While it is often easy for domain experts to specify individual…

Machine Learning · Statistics 2018-12-10 Alexander J. Ratner , Henry R. Ehrenberg , Zeshan Hussain , Jared Dunnmon , Christopher Ré

Improved Mixed-Example Data Augmentation

In order to reduce overfitting, neural networks are typically trained with data augmentation, the practice of artificially generating additional training data via label-preserving transformations of existing training examples. While these…

Computer Vision and Pattern Recognition · Computer Science 2019-01-23 Cecilia Summers , Michael J. Dinneen

Research Trends and Applications of Data Augmentation Algorithms

In the Machine Learning research community, there is a consensus regarding the relationship between model complexity and the required amount of data and computation power. In real world applications, these computational requirements are not…

Machine Learning · Computer Science 2022-08-03 Joao Fonseca , Fernando Bacao

Data Augmentation for Manipulation

The success of deep learning depends heavily on the availability of large datasets, but in robotic manipulation there are many learning problems for which such datasets do not exist. Collecting these datasets is time-consuming and…

Robotics · Computer Science 2022-07-21 Peter Mitrano , Dmitry Berenson