Related papers: Toward Understanding Generative Data Augmentation

Augmentation-Aware Self-Supervision for Data-Efficient GAN Training

Training generative adversarial networks (GANs) with limited data is challenging because the discriminator is prone to overfitting. Previously proposed differentiable augmentation demonstrates improved data efficiency of training GANs.…

Machine Learning · Computer Science 2023-12-29 Liang Hou , Qi Cao , Yige Yuan , Songtao Zhao , Chongyang Ma , Siyuan Pan , Pengfei Wan , Zhongyuan Wang , Huawei Shen , Xueqi Cheng

Negative Data Augmentation

Data augmentation is often used to enlarge datasets with synthetic samples generated in accordance with the underlying data distribution. To enable a wider range of augmentations, we explore negative data augmentation strategies (NDA)that…

Computer Vision and Pattern Recognition · Computer Science 2021-02-11 Abhishek Sinha , Kumar Ayush , Jiaming Song , Burak Uzkent , Hongxia Jin , Stefano Ermon

Data Augmentation Using GANs

In this paper we propose the use of Generative Adversarial Networks (GAN) to generate artificial training data for machine learning tasks. The generation of artificial training data can be extremely useful in situations such as imbalanced…

Machine Learning · Computer Science 2019-04-22 Fabio Henrique Kiyoiti dos Santos Tanaka , Claus Aranha

A Comparative Study on Enhancing Prediction in Social Network Advertisement through Data Augmentation

In the ever-evolving landscape of social network advertising, the volume and accuracy of data play a critical role in the performance of predictive models. However, the development of robust predictive algorithms is often hampered by the…

Social and Information Networks · Computer Science 2024-11-13 Qikai Yang , Panfeng Li , Xinhe Xu , Zhicheng Ding , Wenjing Zhou , Yi Nian

Data Augmentation for Sparse Multidimensional Learning Performance Data Using Generative AI

Learning performance data describe correct and incorrect answers or problem-solving attempts in adaptive learning, such as in intelligent tutoring systems (ITSs). Learning performance data tend to be highly sparse (80\%\(\sim\)90\% missing…

Machine Learning · Computer Science 2025-01-07 Liang Zhang , Jionghao Lin , John Sabatini , Conrad Borchers , Daniel Weitekamp , Meng Cao , John Hollander , Xiangen Hu , Arthur C. Graesser

Tempered Adversarial Networks

Generative adversarial networks (GANs) have been shown to produce realistic samples from high-dimensional distributions, but training them is considered hard. A possible explanation for training instabilities is the inherent imbalance…

Machine Learning · Statistics 2018-07-12 Mehdi S. M. Sajjadi , Giambattista Parascandolo , Arash Mehrjou , Bernhard Schölkopf

On Data Augmentation for GAN Training

Recent successes in Generative Adversarial Networks (GAN) have affirmed the importance of using more data in GAN training. Yet it is expensive to collect data in many domains such as medical applications. Data Augmentation (DA) has been…

Computer Vision and Pattern Recognition · Computer Science 2021-02-24 Ngoc-Trung Tran , Viet-Hung Tran , Ngoc-Bao Nguyen , Trung-Kien Nguyen , Ngai-Man Cheung

Dynamic Data Augmentation with Gating Networks for Time Series Recognition

Data augmentation is a technique to improve the generalization ability of machine learning methods by increasing the size of the dataset. However, since every augmentation method is not equally effective for every dataset, you need to…

Machine Learning · Computer Science 2022-05-31 Daisuke Oba , Shinnosuke Matsuo , Brian Kenji Iwana

Gaussian and Non-Gaussian Universality of Data Augmentation

We provide universality results that quantify how data augmentation affects the variance and limiting distribution of estimates through simple surrogates, and analyze several specific models in detail. The results confirm some observations…

Machine Learning · Computer Science 2025-12-03 Kevin Han Huang , Peter Orbanz , Morgane Austern

Data Augmentation Generative Adversarial Networks

Effective training of neural networks requires much data. In the low-data regime, parameters are underdetermined, and learnt networks generalise poorly. Data Augmentation alleviates this by using existing data more effectively. However…

Machine Learning · Statistics 2018-03-23 Antreas Antoniou , Amos Storkey , Harrison Edwards

The Curious Case of Adversarially Robust Models: More Data Can Help, Double Descend, or Hurt Generalization

Adversarial training has shown its ability in producing models that are robust to perturbations on the input data, but usually at the expense of decrease in the standard accuracy. To mitigate this issue, it is commonly believed that more…

Machine Learning · Computer Science 2020-06-09 Yifei Min , Lin Chen , Amin Karbasi

Improved Mixed-Example Data Augmentation

In order to reduce overfitting, neural networks are typically trained with data augmentation, the practice of artificially generating additional training data via label-preserving transformations of existing training examples. While these…

Computer Vision and Pattern Recognition · Computer Science 2019-01-23 Cecilia Summers , Michael J. Dinneen

Scaling-based Data Augmentation for Generative Models and its Theoretical Extension

This paper studies stable learning methods for generative models that enable high-quality data generation. Noise injection is commonly used to stabilize learning. However, selecting a suitable noise distribution is challenging.…

Machine Learning · Statistics 2024-10-29 Yoshitaka Koike , Takumi Nakagawa , Hiroki Waida , Takafumi Kanamori

Generative Hints

Data augmentation is widely used in vision to introduce variation and mitigate overfitting, by enabling models to learn invariant properties. However, augmentation only indirectly captures these properties and does not explicitly constrain…

Computer Vision and Pattern Recognition · Computer Science 2026-03-19 Andy Dimnaku , Abdullah Yusuf Kavranoglu , Yaser Abu-Mostafa

Boosting Model Resilience via Implicit Adversarial Data Augmentation

Data augmentation plays a pivotal role in enhancing and diversifying training data. Nonetheless, consistently improving model performance in varied learning scenarios, especially those with inherent data biases, remains challenging. To…

Machine Learning · Computer Science 2024-06-04 Xiaoling Zhou , Wei Ye , Zhemg Lee , Rui Xie , Shikun Zhang

Improving Machine Learning Performance with Synthetic Augmentation

Synthetic augmentation is increasingly used to mitigate data scarcity in financial machine learning, yet its statistical role remains poorly understood. We formalize synthetic augmentation as a modification of the effective training…

Artificial Intelligence · Computer Science 2026-04-17 Mel Sohm , Charles Dezons , Sami Sellami , Oscar Ninou , Axel Pincon

Towards Understanding How Data Augmentation Works with Imbalanced Data

Data augmentation forms the cornerstone of many modern machine learning training pipelines; yet, the mechanisms by which it works are not clearly understood. Much of the research on data augmentation (DA) has focused on improving existing…

Machine Learning · Computer Science 2023-04-13 Damien A. Dablain , Nitesh V. Chawla

Exploring Bias in GAN-based Data Augmentation for Small Samples

For machine learning task, lacking sufficient samples mean the trained model has low confidence to approach the ground truth function. Until recently, after the generative adversarial networks (GAN) had been proposed, we see the hope of…

Machine Learning · Computer Science 2019-05-22 Mengxiao Hu , Jinlong Li

Statistical Guarantees of Group-Invariant GANs

This work presents the first statistical performance guarantees for group-invariant generative models. Many real data, such as images and molecules, are invariant to certain group symmetries, which can be taken advantage of to learn more…

Machine Learning · Statistics 2025-03-12 Ziyu Chen , Markos A. Katsoulakis , Luc Rey-Bellet , Wei Zhu

Boosting Statistic Learning with Synthetic Data from Pretrained Large Models

The rapid advancement of generative models, such as Stable Diffusion, raises a key question: how can synthetic data from these models enhance predictive modeling? While they can generate vast amounts of datasets, only a subset meaningfully…

Machine Learning · Statistics 2025-05-09 Jialong Jiang , Wenkang Hu , Jian Huang , Yuling Jiao , Xu Liu