Related papers: Toward Understanding Generative Data Augmentation
Training generative adversarial networks (GANs) with limited data is challenging because the discriminator is prone to overfitting. Previously proposed differentiable augmentation demonstrates improved data efficiency of training GANs.…
Data augmentation is often used to enlarge datasets with synthetic samples generated in accordance with the underlying data distribution. To enable a wider range of augmentations, we explore negative data augmentation strategies (NDA)that…
In this paper we propose the use of Generative Adversarial Networks (GAN) to generate artificial training data for machine learning tasks. The generation of artificial training data can be extremely useful in situations such as imbalanced…
In the ever-evolving landscape of social network advertising, the volume and accuracy of data play a critical role in the performance of predictive models. However, the development of robust predictive algorithms is often hampered by the…
Learning performance data describe correct and incorrect answers or problem-solving attempts in adaptive learning, such as in intelligent tutoring systems (ITSs). Learning performance data tend to be highly sparse (80\%\(\sim\)90\% missing…
Generative adversarial networks (GANs) have been shown to produce realistic samples from high-dimensional distributions, but training them is considered hard. A possible explanation for training instabilities is the inherent imbalance…
Recent successes in Generative Adversarial Networks (GAN) have affirmed the importance of using more data in GAN training. Yet it is expensive to collect data in many domains such as medical applications. Data Augmentation (DA) has been…
Data augmentation is a technique to improve the generalization ability of machine learning methods by increasing the size of the dataset. However, since every augmentation method is not equally effective for every dataset, you need to…
We provide universality results that quantify how data augmentation affects the variance and limiting distribution of estimates through simple surrogates, and analyze several specific models in detail. The results confirm some observations…
Effective training of neural networks requires much data. In the low-data regime, parameters are underdetermined, and learnt networks generalise poorly. Data Augmentation alleviates this by using existing data more effectively. However…
Adversarial training has shown its ability in producing models that are robust to perturbations on the input data, but usually at the expense of decrease in the standard accuracy. To mitigate this issue, it is commonly believed that more…
In order to reduce overfitting, neural networks are typically trained with data augmentation, the practice of artificially generating additional training data via label-preserving transformations of existing training examples. While these…
This paper studies stable learning methods for generative models that enable high-quality data generation. Noise injection is commonly used to stabilize learning. However, selecting a suitable noise distribution is challenging.…
Data augmentation is widely used in vision to introduce variation and mitigate overfitting, by enabling models to learn invariant properties. However, augmentation only indirectly captures these properties and does not explicitly constrain…
Data augmentation plays a pivotal role in enhancing and diversifying training data. Nonetheless, consistently improving model performance in varied learning scenarios, especially those with inherent data biases, remains challenging. To…
Synthetic augmentation is increasingly used to mitigate data scarcity in financial machine learning, yet its statistical role remains poorly understood. We formalize synthetic augmentation as a modification of the effective training…
Data augmentation forms the cornerstone of many modern machine learning training pipelines; yet, the mechanisms by which it works are not clearly understood. Much of the research on data augmentation (DA) has focused on improving existing…
For machine learning task, lacking sufficient samples mean the trained model has low confidence to approach the ground truth function. Until recently, after the generative adversarial networks (GAN) had been proposed, we see the hope of…
This work presents the first statistical performance guarantees for group-invariant generative models. Many real data, such as images and molecules, are invariant to certain group symmetries, which can be taken advantage of to learn more…
The rapid advancement of generative models, such as Stable Diffusion, raises a key question: how can synthetic data from these models enhance predictive modeling? While they can generate vast amounts of datasets, only a subset meaningfully…