Related papers: Exploring Empty Spaces: Human-in-the-Loop Data Aug…

Data Augmentation for Manipulation

The success of deep learning depends heavily on the availability of large datasets, but in robotic manipulation there are many learning problems for which such datasets do not exist. Collecting these datasets is time-consuming and…

Robotics · Computer Science 2022-07-21 Peter Mitrano , Dmitry Berenson

A Survey on Data Augmentation for Text Classification

Data augmentation, the artificial creation of training data for machine learning by transformations, is a widely studied research field across machine learning disciplines. While it is useful for increasing a model's generalization…

Computation and Language · Computer Science 2022-09-09 Markus Bayer , Marc-André Kaufhold , Christian Reuter

A Survey on Data Augmentation in Large Model Era

Large models, encompassing large language and diffusion models, have shown exceptional promise in approximating human-level intelligence, garnering significant interest from both academic and industrial spheres. However, the training of…

Machine Learning · Computer Science 2024-03-05 Yue Zhou , Chenlu Guo , Xu Wang , Yi Chang , Yuan Wu

Data Augmentation Revisited: Rethinking the Distribution Gap between Clean and Augmented Data

Data augmentation has been widely applied as an effective methodology to improve generalization in particular when training deep neural networks. Recently, researchers proposed a few intensive data augmentation techniques, which indeed…

Machine Learning · Computer Science 2019-11-22 Zhuoxun He , Lingxi Xie , Xin Chen , Ya Zhang , Yanfeng Wang , Qi Tian

Augmentation Invariant Manifold Learning

Data augmentation is a widely used technique and an essential ingredient in the recent advance in self-supervised representation learning. By preserving the similarity between augmented data, the resulting data representation can improve…

Machine Learning · Statistics 2025-01-16 Shulei Wang

UniformAugment: A Search-free Probabilistic Data Augmentation Approach

Augmenting training datasets has been shown to improve the learning effectiveness for several computer vision tasks. A good augmentation produces an augmented dataset that adds variability while retaining the statistical properties of the…

Computer Vision and Pattern Recognition · Computer Science 2020-04-01 Tom Ching LingChen , Ava Khonsari , Amirreza Lashkari , Mina Rafi Nazari , Jaspreet Singh Sambee , Mario A. Nascimento

Expert-guided Clinical Text Augmentation via Query-Based Model Collaboration

Data augmentation is a widely used strategy to improve model robustness and generalization by enriching training datasets with synthetic examples. While large language models (LLMs) have demonstrated strong generative capabilities for this…

Machine Learning · Computer Science 2025-09-29 Dongkyu Cho , Miao Zhang , Rumi Chunara

Data Augmentation Approaches in Natural Language Processing: A Survey

As an effective strategy, data augmentation (DA) alleviates data scarcity scenarios where deep learning techniques may fail. It is widely applied in computer vision then introduced to natural language processing and achieves improvements in…

Computation and Language · Computer Science 2022-06-28 Bohan Li , Yutai Hou , Wanxiang Che

Augmentor: An Image Augmentation Library for Machine Learning

The generation of artificial data based on existing observations, known as data augmentation, is a technique used in machine learning to improve model accuracy, generalisation, and to control overfitting. Augmentor is a software package,…

Computer Vision and Pattern Recognition · Computer Science 2017-08-18 Marcus D. Bloice , Christof Stocker , Andreas Holzinger

A Comprehensive Survey on Data Augmentation

Data augmentation is a series of techniques that generate high-quality artificial data by manipulating existing data samples. By leveraging data augmentation techniques, AI models can achieve significantly improved applicability in tasks…

Machine Learning · Computer Science 2025-10-16 Zaitian Wang , Pengfei Wang , Kunpeng Liu , Pengyang Wang , Yanjie Fu , Chang-Tien Lu , Charu C. Aggarwal , Jian Pei , Yuanchun Zhou

Diversity-oriented Data Augmentation with Large Language Models

Data augmentation is an essential technique in natural language processing (NLP) for enriching training datasets by generating diverse samples. This process is crucial for improving the robustness and generalization capabilities of NLP…

Computation and Language · Computer Science 2025-10-16 Zaitian Wang , Jinghan Zhang , Xinhao Zhang , Kunpeng Liu , Pengfei Wang , Yuanchun Zhou

Learning to Compose Domain-Specific Transformations for Data Augmentation

Data augmentation is a ubiquitous technique for increasing the size of labeled training sets by leveraging task-specific data transformations that preserve class labels. While it is often easy for domain experts to specify individual…

Machine Learning · Statistics 2018-12-10 Alexander J. Ratner , Henry R. Ehrenberg , Zeshan Hussain , Jared Dunnmon , Christopher Ré

Augmenting Medical Imaging: A Comprehensive Catalogue of 65 Techniques for Enhanced Data Analysis

In the realm of medical imaging, the training of machine learning models necessitates a large and varied training dataset to ensure robustness and interoperability. However, acquiring such diverse and heterogeneous data can be difficult due…

Image and Video Processing · Electrical Eng. & Systems 2023-03-03 Manuel Cossio

Effective Data Augmentation With Diffusion Models

Data augmentation is one of the most prevalent tools in deep learning, underpinning many recent advances, including those from classification, generative models, and representation learning. The standard approach to data augmentation…

Computer Vision and Pattern Recognition · Computer Science 2025-06-12 Brandon Trabucco , Kyle Doherty , Max Gurinas , Ruslan Salakhutdinov

Text Data Augmentation for Large Language Models: A Comprehensive Survey of Methods, Challenges, and Opportunities

The increasing size and complexity of pre-trained language models have demonstrated superior performance in many applications, but they usually require large training datasets to be adequately trained. Insufficient training sets could…

Computation and Language · Computer Science 2025-02-03 Yaping Chai , Haoran Xie , Joe S. Qin

Improved Mixed-Example Data Augmentation

In order to reduce overfitting, neural networks are typically trained with data augmentation, the practice of artificially generating additional training data via label-preserving transformations of existing training examples. While these…

Computer Vision and Pattern Recognition · Computer Science 2019-01-23 Cecilia Summers , Michael J. Dinneen

RandAugment: Practical automated data augmentation with a reduced search space

Recent work has shown that data augmentation has the potential to significantly improve the generalization of deep learning models. Recently, automated augmentation strategies have led to state-of-the-art results in image classification and…

Computer Vision and Pattern Recognition · Computer Science 2019-11-15 Ekin D. Cubuk , Barret Zoph , Jonathon Shlens , Quoc V. Le

Data augmentation on-the-fly and active learning in data stream classification

There is an emerging need for predictive models to be trained on-the-fly, since in numerous machine learning applications data are arriving in an online fashion. A critical challenge encountered is that of limited availability of ground…

Machine Learning · Computer Science 2025-08-25 Kleanthis Malialis , Dimitris Papatheodoulou , Stylianos Filippou , Christos G. Panayiotou , Marios M. Polycarpou

Data Augmentation using Large Language Models: Data Perspectives, Learning Paradigms and Challenges

In the rapidly evolving field of large language models (LLMs), data augmentation (DA) has emerged as a pivotal technique for enhancing model performance by diversifying training examples without the need for additional data collection. This…

Computation and Language · Computer Science 2024-07-03 Bosheng Ding , Chengwei Qin , Ruochen Zhao , Tianze Luo , Xinze Li , Guizhen Chen , Wenhan Xia , Junjie Hu , Anh Tuan Luu , Shafiq Joty

How Important are Data Augmentations to Close the Domain Gap for Object Detection in Orbit?

We investigate the efficacy of data augmentations to close the domain gap in spaceborne computer vision, crucial for autonomous operations like on-orbit servicing. As the use of computer vision in space increases, challenges such as hostile…

Computer Vision and Pattern Recognition · Computer Science 2024-10-22 Maximilian Ulmer , Leonard Klüpfel , Maximilian Durner , Rudolph Triebel