Related papers: Self-Improving Diffusion Models with Synthetic Dat…

Self-Consuming Generative Models Go MAD

Seismic advances in generative AI algorithms for imagery, text, and other data types has led to the temptation to use synthetic data to train next-generation models. Repeating this process creates an autophagous (self-consuming) loop whose…

Machine Learning · Computer Science 2023-07-06 Sina Alemohammad , Josue Casco-Rodriguez , Lorenzo Luzi , Ahmed Imtiaz Humayun , Hossein Babaei , Daniel LeJeune , Ali Siahkoohi , Richard G. Baraniuk

BlendFusion -- Scalable Synthetic Data Generation for Diffusion Model Training

With the rapid adoption of diffusion models, synthetic data generation has emerged as a promising approach for addressing the growing demand for large-scale image datasets. However, images generated purely by diffusion models often exhibit…

Computer Vision and Pattern Recognition · Computer Science 2026-04-13 Thejas Venkatesh , Suguna Varshini Velury

Improved Generation of Synthetic Imaging Data Using Feature-Aligned Diffusion

Synthetic data generation is an important application of machine learning in the field of medical imaging. While existing approaches have successfully applied fine-tuned diffusion models for synthesizing medical images, we explore potential…

Computer Vision and Pattern Recognition · Computer Science 2024-10-04 Lakshmi Nair

When AI Eats Itself: On the Caveats of AI Autophagy

Generative Artificial Intelligence (AI) technologies and large models are producing realistic outputs across various domains, such as images, text, speech, and music. Creating these advanced generative models requires significant resources,…

Machine Learning · Computer Science 2024-11-11 Xiaodan Xing , Fadong Shi , Jiahao Huang , Yinzhe Wu , Yang Nan , Sheng Zhang , Yingying Fang , Mike Roberts , Carola-Bibiane Schönlieb , Javier Del Ser , Guang Yang

Multi-modal Synthetic Data Training and Model Collapse: Insights from VLMs and Diffusion Models

Recent research has highlighted the risk of generative model collapse, where performance progressively degrades when continually trained on self-generated data. However, existing exploration on model collapse is limited to single, unimodal…

Machine Learning · Computer Science 2025-05-15 Zizhao Hu , Mohammad Rostami , Jesse Thomason

Augmenting medical image classifiers with synthetic data from latent diffusion models

While hundreds of artificial intelligence (AI) algorithms are now approved or cleared by the US Food and Drugs Administration (FDA), many studies have shown inconsistent generalization or latent bias, particularly for underrepresented…

Computer Vision and Pattern Recognition · Computer Science 2023-08-25 Luke W. Sagers , James A. Diao , Luke Melas-Kyriazi , Matthew Groh , Pranav Rajpurkar , Adewole S. Adamson , Veronica Rotemberg , Roxana Daneshjou , Arjun K. Manrai

Prototype-Guided Diffusion for Digital Pathology: Achieving Foundation Model Performance with Minimal Clinical Data

Foundation models in digital pathology use massive datasets to learn useful compact feature representations of complex histology images. However, there is limited transparency into what drives the correlation between dataset size and…

Graphics · Computer Science 2025-04-18 Ekaterina Redekop , Mara Pleasure , Vedrana Ivezic , Zichen Wang , Kimberly Flores , Anthony Sisk , William Speier , Corey Arnold

Composite Data Augmentations for Synthetic Image Detection Against Real-World Perturbations

The advent of accessible Generative AI tools enables anyone to create and spread synthetic images on social media, often with the intention to mislead, thus posing a significant threat to online information integrity. Most existing…

Computer Vision and Pattern Recognition · Computer Science 2025-06-16 Efthymia Amarantidou , Christos Koutlis , Symeon Papadopoulos , Panagiotis C. Petrantonakis

When Model Knowledge meets Diffusion Model: Diffusion-assisted Data-free Image Synthesis with Alignment of Domain and Class

Open-source pre-trained models hold great potential for diverse applications, but their utility declines when their training data is unavailable. Data-Free Image Synthesis (DFIS) aims to generate images that approximate the learned data…

Computer Vision and Pattern Recognition · Computer Science 2025-06-19 Yujin Kim , Hyunsoo Kim , Hyunwoo J. Kim , Suhyun Kim

Stabilizing Self-Consuming Diffusion Models with Latent Space Filtering

As synthetic data proliferates across the Internet, it is often reused to train successive generations of generative models. This creates a ``self-consuming loop" that can lead to training instability or \textit{model collapse}. Common…

Machine Learning · Computer Science 2025-11-18 Zhongteng Cai , Yaxuan Wang , Yang Liu , Xueru Zhang

Diffusing DeBias: Synthetic Bias Amplification for Model Debiasing

Deep learning model effectiveness in classification tasks is often challenged by the quality and quantity of training data whenever they are affected by strong spurious correlations between specific attributes and target labels. This…

Machine Learning · Computer Science 2025-10-27 Massimiliano Ciranni , Vito Paolo Pastore , Roberto Di Via , Enzo Tartaglione , Francesca Odone , Vittorio Murino

Synthetic Data from Diffusion Models Improve Drug Discovery Prediction

Artificial intelligence (AI) is increasingly used in every stage of drug development. Continuing breakthroughs in AI-based methods for drug discovery require the creation, improvement, and refinement of drug discovery data. We posit a new…

Machine Learning · Computer Science 2024-05-08 Bing Hu , Ashish Saragadam , Anita Layton , Helen Chen

AugGen: Synthetic Augmentation using Diffusion Models Can Improve Recognition

The increasing reliance on large-scale datasets in machine learning poses significant privacy and ethical challenges, particularly in sensitive domains such as face recognition. Synthetic data generation offers a promising alternative;…

Computer Vision and Pattern Recognition · Computer Science 2025-10-27 Parsa Rahimi , Damien Teney , Sebastien Marcel

AutoSimulate: (Quickly) Learning Synthetic Data Generation

Simulation is increasingly being used for generating large labelled datasets in many machine learning problems. Recent methods have focused on adjusting simulator parameters with the goal of maximising accuracy on a validation task, usually…

Computer Vision and Pattern Recognition · Computer Science 2020-08-20 Harkirat Singh Behl , Atılım Güneş Baydin , Ran Gal , Philip H. S. Torr , Vibhav Vineet

Diffusion Curriculum: Synthetic-to-Real Data Curriculum via Image-Guided Diffusion

Low-quality or scarce data has posed significant challenges for training deep neural networks in practice. While classical data augmentation cannot contribute very different new data, diffusion models opens up a new door to build…

Computer Vision and Pattern Recognition · Computer Science 2025-09-29 Yijun Liang , Shweta Bhardwaj , Tianyi Zhou

Studying the Role of Synthetic Data for Machine Learning-based Wireless Networks Traffic Forecasting

Synthetic data generation is an appealing tool for augmenting and enriching datasets, playing a crucial role in advancing artificial intelligence (AI) and machine learning (ML). Not only does synthetic data help build robust AI/ML datasets…

Systems and Control · Electrical Eng. & Systems 2026-03-20 José Pulido , Francesc Wilhelmi , Sergio Fortes , Alfonso Fernández-Durán , Lorenzo Galati Giordano , Raquel Barco

Artificial Inductive Bias for Synthetic Tabular Data Generation in Data-Scarce Scenarios

While synthetic tabular data generation using Deep Generative Models (DGMs) offers a compelling solution to data scarcity and privacy concerns, their effectiveness relies on the availability of substantial training data, often lacking in…

Machine Learning · Computer Science 2025-08-01 Patricia A. Apellániz , Ana Jiménez , Borja Arroyo Galende , Juan Parras , Santiago Zazo

Multivariate Data Augmentation for Predictive Maintenance using Diffusion

Predictive maintenance has been used to optimize system repairs in the industrial, medical, and financial domains. This technique relies on the consistent ability to detect and predict anomalies in critical systems. AI models have been…

Machine Learning · Computer Science 2024-11-12 Andrew Thompson , Alexander Sommers , Alicia Russell-Gilbert , Logan Cummins , Sudip Mittal , Shahram Rahimi , Maria Seale , Joseph Jaboure , Thomas Arnold , Joshua Church

Learning from Reasoning Failures via Synthetic Data Generation

Training models on synthetic data has emerged as an increasingly important strategy for improving the performance of generative AI. This approach is particularly helpful for large multimodal models (LMMs) due to the relative scarcity of…

Artificial Intelligence · Computer Science 2026-01-13 Gabriela Ben Melech Stan , Estelle Aflalo , Avinash Madasu , Vasudev Lal , Phillip Howard

On the Limitation of Diffusion Models for Synthesizing Training Datasets

Synthetic samples from diffusion models are promising for leveraging in training discriminative models as replications of real training datasets. However, we found that the synthetic datasets degrade classification performance over real…

Artificial Intelligence · Computer Science 2023-11-23 Shin'ya Yamaguchi , Takuma Fukuda