Related papers: Generative Data Refinement: Just Ask for Better Da…

Deep Generative Models, Synthetic Tabular Data, and Differential Privacy: An Overview and Synthesis

This article provides a comprehensive synthesis of the recent developments in synthetic data generation via deep generative models, focusing on tabular datasets. We specifically outline the importance of synthetic data generation in the…

Machine Learning · Computer Science 2023-08-29 Conor Hassan , Robert Salomone , Kerrie Mengersen

Synthetic Data for Model Selection

Recent breakthroughs in synthetic data generation approaches made it possible to produce highly photorealistic images which are hardly distinguishable from real ones. Furthermore, synthetic generation pipelines have the potential to…

Computer Vision and Pattern Recognition · Computer Science 2023-07-06 Alon Shoshan , Nadav Bhonker , Igor Kviatkovsky , Matan Fintz , Gerard Medioni

Your Image Generator Is Your New Private Dataset

Generative diffusion models have emerged as powerful tools to synthetically produce training data, offering potential solutions to data scarcity and reducing labelling costs for downstream supervised deep learning applications. However,…

Computer Vision and Pattern Recognition · Computer Science 2025-04-09 Nicolo Resmini , Eugenio Lomurno , Cristian Sbrolli , Matteo Matteucci

Diversify, Don't Fine-Tune: Scaling Up Visual Recognition Training with Synthetic Images

Recent advances in generative deep learning have enabled the creation of high-quality synthetic images in text-to-image generation. Prior work shows that fine-tuning a pretrained diffusion model on ImageNet and generating synthetic training…

Computer Vision and Pattern Recognition · Computer Science 2025-01-22 Zhuoran Yu , Chenchen Zhu , Sean Culatana , Raghuraman Krishnamoorthi , Fanyi Xiao , Yong Jae Lee

Filtering with Confidence: When Data Augmentation Meets Conformal Prediction

With promising empirical performance across a wide range of applications, synthetic data augmentation appears a viable solution to data scarcity and the demands of increasingly data-intensive models. Its effectiveness lies in expanding the…

Machine Learning · Computer Science 2026-02-02 Zixuan Wu , So Won Jeong , Yating Liu , Yeo Jin Jung , Claire Donnat

Generative Models with Information-Theoretic Protection Against Membership Inference Attacks

Deep generative models, such as Generative Adversarial Networks (GANs), synthesize diverse high-fidelity data samples by estimating the underlying distribution of high dimensional data. Despite their success, GANs may disclose private…

Machine Learning · Computer Science 2022-06-02 Parisa Hassanzadeh , Robert E. Tillman

Toward Understanding Generative Data Augmentation

Generative data augmentation, which scales datasets by obtaining fake labeled examples from a trained conditional generative model, boosts classification performance in various learning tasks including (semi-)supervised learning, few-shot…

Machine Learning · Computer Science 2023-05-30 Chenyu Zheng , Guoqiang Wu , Chongxuan Li

A primer on synthetic health data

Recent advances in deep generative models have greatly expanded the potential to create realistic synthetic health datasets. These synthetic datasets aim to preserve the characteristics, patterns, and overall scientific conclusions derived…

Machine Learning · Computer Science 2024-07-04 Jennifer A Bartell , Sander Boisen Valentin , Anders Krogh , Henning Langberg , Martin Bøgsted

Assessing Generative Models for Structured Data

Synthetic tabular data generation has emerged as a promising method to address limited data availability and privacy concerns. With the sharp increase in the performance of large language models in recent years, researchers have been…

Machine Learning · Computer Science 2025-03-28 Reilly Cannon , Nicolette M. Laird , Caesar Vazquez , Andy Lin , Amy Wagler , Tony Chiang

Graphical vs. Deep Generative Models: Measuring the Impact of Differentially Private Mechanisms and Budgets on Utility

Generative models trained with Differential Privacy (DP) can produce synthetic data while reducing privacy risks. However, navigating their privacy-utility tradeoffs makes finding the best models for specific settings/tasks challenging.…

Machine Learning · Computer Science 2024-08-30 Georgi Ganev , Kai Xu , Emiliano De Cristofaro

Image Generation From Small Datasets via Batch Statistics Adaptation

Thanks to the recent development of deep generative models, it is becoming possible to generate high-quality images with both fidelity and diversity. However, the training of such generative models requires a large dataset. To reduce the…

Computer Vision and Pattern Recognition · Computer Science 2019-10-24 Atsuhiro Noguchi , Tatsuya Harada

Improving the Scaling Laws of Synthetic Data with Deliberate Practice

Inspired by the principle of deliberate practice in human learning, we propose Deliberate Practice for Synthetic Data Generation (DP), a novel framework that improves sample efficiency through dynamic synthetic data generation. Prior work…

Machine Learning · Computer Science 2025-02-24 Reyhane Askari-Hemmat , Mohammad Pezeshki , Elvis Dohmatob , Florian Bordes , Pietro Astolfi , Melissa Hall , Jakob Verbeek , Michal Drozdzal , Adriana Romero-Soriano

Comprehensive Exploration of Synthetic Data Generation: A Survey

Recent years have witnessed a surge in the popularity of Machine Learning (ML), applied across diverse domains. However, progress is impeded by the scarcity of training data due to expensive acquisition and privacy legislation. Synthetic…

Machine Learning · Computer Science 2024-02-05 André Bauer , Simon Trapp , Michael Stenger , Robert Leppich , Samuel Kounev , Mark Leznik , Kyle Chard , Ian Foster

Synthetic Image Learning: Preserving Performance and Preventing Membership Inference Attacks

Generative artificial intelligence has transformed the generation of synthetic data, providing innovative solutions to challenges like data scarcity and privacy, which are particularly critical in fields such as medicine. However, the…

Computer Vision and Pattern Recognition · Computer Science 2024-07-31 Eugenio Lomurno , Matteo Matteucci

Machine Learning for Synthetic Data Generation: A Review

Machine learning heavily relies on data, but real-world applications often encounter various data-related issues. These include data of poor quality, insufficient data points leading to under-fitting of machine learning models, and…

Machine Learning · Computer Science 2025-04-07 Yingzhou Lu , Lulu Chen , Yuanyuan Zhang , Minjie Shen , Huazheng Wang , Xiao Wang , Capucine van Rechem , Tianfan Fu , Wenqi Wei

Deep Generative Modeling on Limited Data with Regularization by Nontransferable Pre-trained Models

Deep generative models (DGMs) are data-eager because learning a complex model on limited data suffers from a large variance and easily overfits. Inspired by the classical perspective of the bias-variance tradeoff, we propose regularized…

Machine Learning · Computer Science 2023-04-11 Yong Zhong , Hongtao Liu , Xiaodong Liu , Fan Bao , Weiran Shen , Chongxuan Li

Exploring the Equivalence of Closed-Set Generative and Real Data Augmentation in Image Classification

In this paper, we address a key scientific problem in machine learning: Given a training set for an image classification task, can we train a generative model on this dataset to enhance the classification performance? (i.e., closed-set…

Computer Vision and Pattern Recognition · Computer Science 2025-08-14 Haowen Wang , Guowei Zhang , Xiang Zhang , Zeyuan Chen , Haiyang Xu , Dou Hoon Kwark , Zhuowen Tu

Synthetic Data in Human Analysis: A Survey

Deep neural networks have become prevalent in human analysis, boosting the performance of applications, such as biometric recognition, action recognition, as well as person re-identification. However, the performance of such networks scales…

Computer Vision and Pattern Recognition · Computer Science 2022-08-22 Indu Joshi , Marcel Grimmer , Christian Rathgeb , Christoph Busch , Francois Bremond , Antitza Dantcheva

Quality-Diversity Generative Sampling for Learning with Synthetic Data

Generative models can serve as surrogates for some real data sources by creating synthetic training datasets, but in doing so they may transfer biases to downstream tasks. We focus on protecting quality and diversity when generating…

Computers and Society · Computer Science 2025-09-08 Allen Chang , Matthew C. Fontaine , Serena Booth , Maja J. Matarić , Stefanos Nikolaidis

Controllable Image Synthesis of Industrial Data Using Stable Diffusion

Training supervised deep neural networks that perform defect detection and segmentation requires large-scale fully-annotated datasets, which can be hard or even impossible to obtain in industrial environments. Generative AI offers…

Computer Vision and Pattern Recognition · Computer Science 2024-01-09 Gabriele Valvano , Antonino Agostino , Giovanni De Magistris , Antonino Graziano , Giacomo Veneri