Related papers: Boosting Source Code Learning with Text-Oriented D…

MIXCODE: Enhancing Code Classification by Mixup-Based Data Augmentation

Inspired by the great success of Deep Neural Networks (DNNs) in natural language processing (NLP), DNNs have been increasingly applied in source code analysis and attracted significant attention from the software engineering community. Due…

Software Engineering · Computer Science 2023-01-11 Zeming Dong , Qiang Hu , Yuejun Guo , Maxime Cordy , Mike Papadakis , Zhenya Zhang , Yves Le Traon , Jianjun Zhao

Source Code Data Augmentation for Deep Learning: A Survey

The increasingly popular adoption of deep learning models in many critical source code tasks motivates the development of data augmentation (DA) techniques to enhance training data and improve various capabilities (e.g., robustness and…

Computation and Language · Computer Science 2023-11-14 Terry Yue Zhuo , Zhou Yang , Zhensu Sun , Yufei Wang , Li Li , Xiaoning Du , Zhenchang Xing , David Lo

Exploring Data Augmentation for Code Generation Tasks

Advances in natural language processing, such as transfer learning from pre-trained language models, have impacted how models are trained for programming language tasks too. Previous research primarily explored code pre-training and…

Computation and Language · Computer Science 2023-02-08 Pinzhen Chen , Gerasimos Lampouras

A Search-Based Testing Framework for Deep Neural Networks of Source Code Embedding

Over the past few years, deep neural networks (DNNs) have been continuously expanding their real-world applications for source code processing tasks across the software engineering domain, e.g., clone detection, code search, comment…

Software Engineering · Computer Science 2021-01-21 Maryam Vahdat Pour , Zhuo Li , Lei Ma , Hadi Hemmati

A Survey on Data Augmentation for Text Classification

Data augmentation, the artificial creation of training data for machine learning by transformations, is a widely studied research field across machine learning disciplines. While it is useful for increasing a model's generalization…

Computation and Language · Computer Science 2022-09-09 Markus Bayer , Marc-André Kaufhold , Christian Reuter

Diversity-oriented Data Augmentation with Large Language Models

Data augmentation is an essential technique in natural language processing (NLP) for enriching training datasets by generating diverse samples. This process is crucial for improving the robustness and generalization capabilities of NLP…

Computation and Language · Computer Science 2025-10-16 Zaitian Wang , Jinghan Zhang , Xinhao Zhang , Kunpeng Liu , Pengfei Wang , Yuanchun Zhou

Image Data Augmentation for Deep Learning: A Survey

Deep learning has achieved remarkable results in many computer vision tasks. Deep neural networks typically rely on large amounts of training data to avoid overfitting. However, labeled data for real-world applications may be limited. By…

Computer Vision and Pattern Recognition · Computer Science 2023-11-07 Suorong Yang , Weikang Xiao , Mengchen Zhang , Suhan Guo , Jian Zhao , Furao Shen

A survey of synthetic data augmentation methods in computer vision

The standard approach to tackling computer vision problems is to train deep convolutional neural network (CNN) models using large-scale image datasets which are representative of the target task. However, in many scenarios, it is often…

Computer Vision and Pattern Recognition · Computer Science 2024-04-01 Alhassan Mumuni , Fuseini Mumuni , Nana Kobina Gerrar

Smart Augmentation - Learning an Optimal Data Augmentation Strategy

A recurring problem faced when training neural networks is that there is typically not enough data to maximize the generalization capability of deep neural networks(DNN). There are many techniques to address this, including data…

Artificial Intelligence · Computer Science 2017-04-26 Joseph Lemley , Shabab Bazrafkan , Peter Corcoran

Data Augmentation Approaches in Natural Language Processing: A Survey

As an effective strategy, data augmentation (DA) alleviates data scarcity scenarios where deep learning techniques may fail. It is widely applied in computer vision then introduced to natural language processing and achieves improvements in…

Computation and Language · Computer Science 2022-06-28 Bohan Li , Yutai Hou , Wanxiang Che

Improving Deep Learning using Generic Data Augmentation

Deep artificial neural networks require a large corpus of training data in order to effectively learn, where collection of such training data is often expensive and laborious. Data augmentation overcomes this issue by artificially inflating…

Machine Learning · Computer Science 2017-08-22 Luke Taylor , Geoff Nitschke

Data Augmentation in Natural Language Processing: A Novel Text Generation Approach for Long and Short Text Classifiers

In many cases of machine learning, research suggests that the development of training data might have a higher relevance than the choice and modelling of classifiers themselves. Thus, data augmentation methods have been developed to improve…

Computation and Language · Computer Science 2022-07-25 Markus Bayer , Marc-André Kaufhold , Björn Buchhold , Marcel Keller , Jörg Dallmeyer , Christian Reuter

Data Augmentation for Text Generation Without Any Augmented Data

Data augmentation is an effective way to improve the performance of many neural text generation models. However, current data augmentation methods need to define or choose proper data mapping functions that map the original samples into the…

Computation and Language · Computer Science 2021-05-31 Wei Bi , Huayang Li , Jiacheng Huang

Image Data Augmentation Approaches: A Comprehensive Survey and Future directions

Deep learning (DL) algorithms have shown significant performance in various computer vision tasks. However, having limited labelled data lead to a network overfitting problem, where network performance is bad on unseen data as compared to…

Computer Vision and Pattern Recognition · Computer Science 2023-03-14 Teerath Kumar , Alessandra Mileo , Rob Brennan , Malika Bendechache

Enhancing Source Code Representations for Deep Learning with Static Analysis

Deep learning techniques applied to program analysis tasks such as code classification, summarization, and bug detection have seen widespread interest. Traditional approaches, however, treat programming source code as natural language text,…

Software Engineering · Computer Science 2024-02-16 Xueting Guan , Christoph Treude

A General Multiple Data Augmentation Based Framework for Training Deep Neural Networks

Deep neural networks (DNNs) often rely on massive labelled data for training, which is inaccessible in many applications. Data augmentation (DA) tackles data scarcity by creating new labelled data from available ones. Different DA methods…

Neural and Evolutionary Computing · Computer Science 2022-05-31 Binyan Hu , Yu Sun , A. K. Qin

Rethinking Data Augmentation for Low-Resource Neural Machine Translation: A Multi-Task Learning Approach

In the context of neural machine translation, data augmentation (DA) techniques may be used for generating additional training samples when the available parallel data are scarce. Many DA approaches aim at expanding the support of the…

Computation and Language · Computer Science 2021-09-09 Víctor M. Sánchez-Cartagena , Miquel Esplà-Gomis , Juan Antonio Pérez-Ortiz , Felipe Sánchez-Martínez

Data Augmentation for Deep Receivers

Deep neural networks (DNNs) allow digital receivers to learn to operate in complex environments. To do so, DNNs should preferably be trained using large labeled data sets with a similar statistical relationship as the one under which they…

Information Theory · Computer Science 2022-09-07 Tomer Raviv , Nir Shlezinger

Text Data Augmentation for Large Language Models: A Comprehensive Survey of Methods, Challenges, and Opportunities

The increasing size and complexity of pre-trained language models have demonstrated superior performance in many applications, but they usually require large training datasets to be adequately trained. Insufficient training sets could…

Computation and Language · Computer Science 2025-02-03 Yaping Chai , Haoran Xie , Joe S. Qin

Leveraging Data Augmentation for Process Information Extraction

Business Process Modeling projects often require formal process models as a central component. High costs associated with the creation of such formal process models motivated many different fields of research aimed at automated generation…

Computation and Language · Computer Science 2024-04-12 Julian Neuberger , Leonie Doll , Benedict Engelmann , Lars Ackermann , Stefan Jablonski