Related papers: STA: Self-controlled Text Augmentation for Improvi…

Selective Text Augmentation with Word Roles for Low-Resource Text Classification

Data augmentation techniques are widely used in text classification tasks to improve the performance of classifiers, especially in low-resource scenarios. Most previous methods conduct text augmentation without considering the different…

Computation and Language · Computer Science 2022-09-07 Biyang Guo , Songqiao Han , Hailiang Huang

What Have Been Learned & What Should Be Learned? An Empirical Study of How to Selectively Augment Text for Classification

Text augmentation techniques are widely used in text classification problems to improve the performance of classifiers, especially in low-resource scenarios. Whilst lots of creative text augmentation methods have been designed, they augment…

Computation and Language · Computer Science 2021-09-02 Biyang Guo , Sonqiao Han , Hailiang Huang

SDA: Improving Text Generation with Self Data Augmentation

Data augmentation has been widely used to improve deep neural networks in many research fields, such as computer vision. However, less work has been done in the context of text, partially due to its discrete nature and the complexity of…

Computation and Language · Computer Science 2021-01-12 Ping Yu , Ruiyi Zhang , Yang Zhao , Yizhe Zhang , Chunyuan Li , Changyou Chen

Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification

Data augmentation aims to enrich training samples for alleviating the overfitting issue in low-resource or class-imbalanced situations. Traditional methods first devise task-specific operations such as Synonym Substitute, then preset the…

Computation and Language · Computer Science 2021-09-03 Shuhuai Ren , Jinchao Zhang , Lei Li , Xu Sun , Jie Zhou

Data Augmentation in Natural Language Processing: A Novel Text Generation Approach for Long and Short Text Classifiers

In many cases of machine learning, research suggests that the development of training data might have a higher relevance than the choice and modelling of classifiers themselves. Thus, data augmentation methods have been developed to improve…

Computation and Language · Computer Science 2022-07-25 Markus Bayer , Marc-André Kaufhold , Björn Buchhold , Marcel Keller , Jörg Dallmeyer , Christian Reuter

STraTA: Self-Training with Task Augmentation for Better Few-shot Learning

Despite their recent successes in tackling many NLP tasks, large-scale pre-trained language models do not perform as well in few-shot settings where only a handful of training examples are available. To address this shortcoming, we propose…

Computation and Language · Computer Science 2022-04-13 Tu Vu , Minh-Thang Luong , Quoc V. Le , Grady Simon , Mohit Iyyer

Improved Text Classification via Test-Time Augmentation

Test-time augmentation -- the aggregation of predictions across transformed examples of test inputs -- is an established technique to improve the performance of image classification models. Importantly, TTA can be used to improve model…

Machine Learning · Computer Science 2022-06-29 Helen Lu , Divya Shanmugam , Harini Suresh , John Guttag

Distributional Data Augmentation Methods for Low Resource Language

Text augmentation is a technique for constructing synthetic data from an under-resourced corpus to improve predictive performance. Synthetic data generation is common in numerous domains. However, recently text augmentation has emerged in…

Computation and Language · Computer Science 2023-09-12 Mosleh Mahamud , Zed Lee , Isak Samsten

Advancing NLP Models with Strategic Text Augmentation: A Comprehensive Study of Augmentation Methods and Curriculum Strategies

This study conducts a thorough evaluation of text augmentation techniques across a variety of datasets and natural language processing (NLP) tasks to address the lack of reliable, generalized evidence for these methods. It examines the…

Computation and Language · Computer Science 2024-02-15 Himmet Toprak Kesgin , Mehmet Fatih Amasyali

Not Enough Data? Deep Learning to the Rescue!

Based on recent advances in natural language modeling and those in text generation capabilities, we propose a novel data augmentation method for text classification tasks. We use a powerful pre-trained neural network model to artificially…

Computation and Language · Computer Science 2019-11-28 Ateret Anaby-Tavor , Boaz Carmeli , Esther Goldbraich , Amir Kantor , George Kour , Segev Shlomov , Naama Tepper , Naama Zwerdling

Controlled Text Generation for Data Augmentation in Intelligent Artificial Agents

Data availability is a bottleneck during early stages of development of new capabilities for intelligent artificial agents. We investigate the use of text generation techniques to augment the training data of a popular commercial artificial…

Computation and Language · Computer Science 2019-10-09 Nikolaos Malandrakis , Minmin Shen , Anuj Goyal , Shuyang Gao , Abhishek Sethi , Angeliki Metallinou

DuNST: Dual Noisy Self Training for Semi-Supervised Controllable Text Generation

Self-training (ST) has prospered again in language understanding by augmenting the fine-tuning of pre-trained language models when labeled data is insufficient. However, it remains challenging to incorporate ST into attribute-controllable…

Computation and Language · Computer Science 2023-06-07 Yuxi Feng , Xiaoyuan Yi , Xiting Wang , Laks V. S. Lakshmanan , Xing Xie

A Survey on Data Augmentation for Text Classification

Data augmentation, the artificial creation of training data for machine learning by transformations, is a widely studied research field across machine learning disciplines. While it is useful for increasing a model's generalization…

Computation and Language · Computer Science 2022-09-09 Markus Bayer , Marc-André Kaufhold , Christian Reuter

On Evaluation Protocols for Data Augmentation in a Limited Data Scenario

Textual data augmentation (DA) is a prolific field of study where novel techniques to create artificial data are regularly proposed, and that has demonstrated great efficiency on small data settings, at least for text classification tasks.…

Computation and Language · Computer Science 2024-09-18 Frédéric Piedboeuf , Philippe Langlais

Rethink the Effectiveness of Text Data Augmentation: An Empirical Analysis

In recent years, language models (LMs) have made remarkable progress in advancing the field of natural language processing (NLP). However, the impact of data augmentation (DA) techniques on the fine-tuning (FT) performance of these LMs has…

Computation and Language · Computer Science 2023-06-14 Zhengxiang Shi , Aldo Lipani

Data Augmentation Approaches in Natural Language Processing: A Survey

As an effective strategy, data augmentation (DA) alleviates data scarcity scenarios where deep learning techniques may fail. It is widely applied in computer vision then introduced to natural language processing and achieves improvements in…

Computation and Language · Computer Science 2022-06-28 Bohan Li , Yutai Hou , Wanxiang Che

AugGPT: Leveraging ChatGPT for Text Data Augmentation

Text data augmentation is an effective strategy for overcoming the challenge of limited sample sizes in many natural language processing (NLP) tasks. This challenge is especially prominent in the few-shot learning scenario, where the data…

Computation and Language · Computer Science 2023-03-21 Haixing Dai , Zhengliang Liu , Wenxiong Liao , Xiaoke Huang , Yihan Cao , Zihao Wu , Lin Zhao , Shaochen Xu , Wei Liu , Ninghao Liu , Sheng Li , Dajiang Zhu , Hongmin Cai , Lichao Sun , Quanzheng Li , Dinggang Shen , Tianming Liu , Xiang Li

Text Data Augmentation Made Simple By Leveraging NLP Cloud APIs

In practice, it is common to find oneself with far too little text data to train a deep neural network. This "Big Data Wall" represents a challenge for minority language communities on the Internet, organizations, laboratories and companies…

Computation and Language · Computer Science 2018-12-13 Claude Coulombe

Empowering Large Language Models for Textual Data Augmentation

With the capabilities of understanding and executing natural language instructions, Large language models (LLMs) can potentially act as a powerful tool for textual data augmentation. However, the quality of augmented data depends heavily on…

Computation and Language · Computer Science 2024-04-30 Yichuan Li , Kaize Ding , Jianling Wang , Kyumin Lee

Leveraging Data Augmentation for Process Information Extraction

Business Process Modeling projects often require formal process models as a central component. High costs associated with the creation of such formal process models motivated many different fields of research aimed at automated generation…

Computation and Language · Computer Science 2024-04-12 Julian Neuberger , Leonie Doll , Benedict Engelmann , Lars Ackermann , Stefan Jablonski