Related papers: Augmenty: A Python Library for Structured Text Aug…

Augmentor: An Image Augmentation Library for Machine Learning

The generation of artificial data based on existing observations, known as data augmentation, is a technique used in machine learning to improve model accuracy, generalisation, and to control overfitting. Augmentor is a software package,…

Computer Vision and Pattern Recognition · Computer Science 2017-08-18 Marcus D. Bloice , Christof Stocker , Andreas Holzinger

Augraphy: A Data Augmentation Library for Document Images

This paper introduces Augraphy, a Python library for constructing data augmentation pipelines which produce distortions commonly seen in real-world document image datasets. Augraphy stands apart from other data augmentation tools by…

Computer Vision and Pattern Recognition · Computer Science 2023-03-28 Alexander Groleau , Kok Wei Chee , Stefan Larson , Samay Maini , Jonathan Boarman

AugLy: Data Augmentations for Robustness

We introduce AugLy, a data augmentation library with a focus on adversarial robustness. AugLy provides a wide array of augmentations for multiple modalities (audio, image, text, & video). These augmentations were inspired by those that real…

Artificial Intelligence · Computer Science 2022-01-19 Zoe Papakipos , Joanna Bitton

NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on. In this paper, we present NL-Augmenter, a new…

Computation and Language · Computer Science 2022-10-14 Kaustubh D. Dhole , Varun Gangal , Sebastian Gehrmann , Aadesh Gupta , Zhenhao Li , Saad Mahamood , Abinaya Mahendiran , Simon Mille , Ashish Shrivastava , Samson Tan , Tongshuang Wu , Jascha Sohl-Dickstein , Jinho D. Choi , Eduard Hovy , Ondrej Dusek , Sebastian Ruder , Sajant Anand , Nagender Aneja , Rabin Banjade , Lisa Barthe , Hanna Behnke , Ian Berlot-Attwell , Connor Boyle , Caroline Brun , Marco Antonio Sobrevilla Cabezudo , Samuel Cahyawijaya , Emile Chapuis , Wanxiang Che , Mukund Choudhary , Christian Clauss , Pierre Colombo , Filip Cornell , Gautier Dagan , Mayukh Das , Tanay Dixit , Thomas Dopierre , Paul-Alexis Dray , Suchitra Dubey , Tatiana Ekeinhor , Marco Di Giovanni , Tanya Goyal , Rishabh Gupta , Rishabh Gupta , Louanes Hamla , Sang Han , Fabrice Harel-Canada , Antoine Honore , Ishan Jindal , Przemyslaw K. Joniak , Denis Kleyko , Venelin Kovatchev , Kalpesh Krishna , Ashutosh Kumar , Stefan Langer , Seungjae Ryan Lee , Corey James Levinson , Hualou Liang , Kaizhao Liang , Zhexiong Liu , Andrey Lukyanenko , Vukosi Marivate , Gerard de Melo , Simon Meoni , Maxime Meyer , Afnan Mir , Nafise Sadat Moosavi , Niklas Muennighoff , Timothy Sum Hon Mun , Kenton Murray , Marcin Namysl , Maria Obedkova , Priti Oli , Nivranshu Pasricha , Jan Pfister , Richard Plant , Vinay Prabhu , Vasile Pais , Libo Qin , Shahab Raji , Pawan Kumar Rajpoot , Vikas Raunak , Roy Rinberg , Nicolas Roberts , Juan Diego Rodriguez , Claude Roux , Vasconcellos P. H. S. , Ananya B. Sai , Robin M. Schmidt , Thomas Scialom , Tshephisho Sefara , Saqib N. Shamsi , Xudong Shen , Haoyue Shi , Yiwen Shi , Anna Shvets , Nick Siegel , Damien Sileo , Jamie Simon , Chandan Singh , Roman Sitelew , Priyank Soni , Taylor Sorensen , William Soto , Aman Srivastava , KV Aditya Srivatsa , Tony Sun , Mukund Varma T , A Tabassum , Fiona Anting Tan , Ryan Teehan , Mo Tiwari , Marie Tolkiehn , Athena Wang , Zijian Wang , Gloria Wang , Zijie J. Wang , Fuxuan Wei , Bryan Wilie , Genta Indra Winata , Xinyi Wu , Witold Wydmański , Tianbao Xie , Usama Yaseen , Michael A. Yee , Jing Zhang , Yue Zhang

AugmenTory: A Fast and Flexible Polygon Augmentation Library

Data augmentation is a key technique for addressing the challenge of limited datasets, which have become a major component in the training procedures of image processing. Techniques such as geometric transformations and color space…

Computer Vision and Pattern Recognition · Computer Science 2024-05-08 Tanaz Ghahremani , Mohammad Hoseyni , Mohammad Javad Ahmadi , Pouria Mehrabi , Amirhossein Nikoofard

Named Entity Recognition for Social Media Texts with Semantic Augmentation

Existing approaches for named entity recognition suffer from data sparsity problems when conducted on short and informal texts, especially user-generated social media content. Semantic augmentation is a potential way to alleviate this…

Computation and Language · Computer Science 2020-10-30 Yuyang Nie , Yuanhe Tian , Xiang Wan , Yan Song , Bo Dai

AugCSE: Contrastive Sentence Embedding with Diverse Augmentations

Data augmentation techniques have been proven useful in many applications in NLP fields. Most augmentations are task-specific, and cannot be used as a general-purpose tool. In our work, we present AugCSE, a unified framework to utilize…

Computation and Language · Computer Science 2022-10-26 Zilu Tang , Muhammed Yusuf Kocyigit , Derry Wijaya

Entity-to-Text based Data Augmentation for various Named Entity Recognition Tasks

Data augmentation techniques have been used to alleviate the problem of scarce labeled data in various NER tasks (flat, nested, and discontinuous NER tasks). Existing augmentation techniques either manipulate the words in the original text…

Computation and Language · Computer Science 2023-05-29 Xuming Hu , Yong Jiang , Aiwei Liu , Zhongqiang Huang , Pengjun Xie , Fei Huang , Lijie Wen , Philip S. Yu

Audiogmenter: a MATLAB Toolbox for Audio Data Augmentation

Audio data augmentation is a key step in training deep neural networks for solving audio classification tasks. In this paper, we introduce Audiogmenter, a novel audio data augmentation library in MATLAB. We provide 15 different augmentation…

Audio and Speech Processing · Electrical Eng. & Systems 2022-01-04 Gianluca Maguolo , Michelangelo Paci , Loris Nanni , Ludovico Bonan

Albumentations: fast and flexible image augmentations

Data augmentation is a commonly used technique for increasing both the size and the diversity of labeled training sets by leveraging input transformations that preserve output labels. In computer vision domain, image augmentations have…

Computer Vision and Pattern Recognition · Computer Science 2020-02-27 Alexander Buslaev , Alex Parinov , Eugene Khvedchenya , Vladimir I. Iglovikov , Alexandr A. Kalinin

A Survey on Data Augmentation for Text Classification

Data augmentation, the artificial creation of training data for machine learning by transformations, is a widely studied research field across machine learning disciplines. While it is useful for increasing a model's generalization…

Computation and Language · Computer Science 2022-09-09 Markus Bayer , Marc-André Kaufhold , Christian Reuter

Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition

Handwritten text and scene text suffer from various shapes and distorted patterns. Thus training a robust recognition model requires a large amount of data to cover diversity as much as possible. In contrast to data collection and…

Computer Vision and Pattern Recognition · Computer Science 2020-03-17 Canjie Luo , Yuanzhi Zhu , Lianwen Jin , Yongpan Wang

PAGE: Prompt Augmentation for text Generation Enhancement

In recent years, natural language generative models have shown outstanding performance in text generation tasks. However, when facing specific tasks or particular requirements, they may exhibit poor performance or require adjustments that…

Computation and Language · Computer Science 2025-10-17 Mauro Jose Pacchiotti , Luciana Ballejos , Mariel Ale

Guidance-Based Prompt Data Augmentation in Specialized Domains for Named Entity Recognition

While the abundance of rich and vast datasets across numerous fields has facilitated the advancement of natural language processing, sectors in need of specialized data types continue to struggle with the challenge of finding quality data.…

Computation and Language · Computer Science 2026-02-06 Hyeonseok Kang , Hyein Seo , Jeesu Jung , Sangkeun Jung , Du-Seong Chang , Riwoo Chung

Syntax-driven Data Augmentation for Named Entity Recognition

In low resource settings, data augmentation strategies are commonly leveraged to improve performance. Numerous approaches have attempted document-level augmentation (e.g., text classification), but few studies have explored token-level…

Computation and Language · Computer Science 2022-10-04 Arie Pratama Sutiono , Gus Hahn-Powell

Empowering Large Language Models for Textual Data Augmentation

With the capabilities of understanding and executing natural language instructions, Large language models (LLMs) can potentially act as a powerful tool for textual data augmentation. However, the quality of augmented data depends heavily on…

Computation and Language · Computer Science 2024-04-30 Yichuan Li , Kaize Ding , Jianling Wang , Kyumin Lee

To Augment or Not to Augment? A Comparative Study on Text Augmentation Techniques for Low-Resource NLP

Data-hungry deep neural networks have established themselves as the standard for many NLP tasks including the traditional sequence tagging ones. Despite their state-of-the-art performance on high-resource languages, they still fall behind…

Computation and Language · Computer Science 2021-11-19 Gözde Gül Şahin

Auctus: A Dataset Search Engine for Data Augmentation

The large volumes of structured data currently available, from Web tables to open-data portals and enterprise data, open up new opportunities for progress in answering many important scientific, societal, and business questions. However,…

Information Retrieval · Computer Science 2021-09-01 Sonia Castelo , Rémi Rampin , Aécio Santos , Aline Bessa , Fernando Chirigati , Juliana Freire

Text Data Augmentation for Large Language Models: A Comprehensive Survey of Methods, Challenges, and Opportunities

The increasing size and complexity of pre-trained language models have demonstrated superior performance in many applications, but they usually require large training datasets to be adequately trained. Insufficient training sets could…

Computation and Language · Computer Science 2025-02-03 Yaping Chai , Haoran Xie , Joe S. Qin

AugGPT: Leveraging ChatGPT for Text Data Augmentation

Text data augmentation is an effective strategy for overcoming the challenge of limited sample sizes in many natural language processing (NLP) tasks. This challenge is especially prominent in the few-shot learning scenario, where the data…

Computation and Language · Computer Science 2023-03-21 Haixing Dai , Zhengliang Liu , Wenxiong Liao , Xiaoke Huang , Yihan Cao , Zihao Wu , Lin Zhao , Shaochen Xu , Wei Liu , Ninghao Liu , Sheng Li , Dajiang Zhu , Hongmin Cai , Lichao Sun , Quanzheng Li , Dinggang Shen , Tianming Liu , Xiang Li