English
Related papers

Related papers: Self-Improving Diffusion Models with Synthetic Dat…

200 papers

Seismic advances in generative AI algorithms for imagery, text, and other data types has led to the temptation to use synthetic data to train next-generation models. Repeating this process creates an autophagous (self-consuming) loop whose…

With the rapid adoption of diffusion models, synthetic data generation has emerged as a promising approach for addressing the growing demand for large-scale image datasets. However, images generated purely by diffusion models often exhibit…

Computer Vision and Pattern Recognition · Computer Science 2026-04-13 Thejas Venkatesh , Suguna Varshini Velury

Synthetic data generation is an important application of machine learning in the field of medical imaging. While existing approaches have successfully applied fine-tuned diffusion models for synthesizing medical images, we explore potential…

Computer Vision and Pattern Recognition · Computer Science 2024-10-04 Lakshmi Nair

Generative Artificial Intelligence (AI) technologies and large models are producing realistic outputs across various domains, such as images, text, speech, and music. Creating these advanced generative models requires significant resources,…

Recent research has highlighted the risk of generative model collapse, where performance progressively degrades when continually trained on self-generated data. However, existing exploration on model collapse is limited to single, unimodal…

Machine Learning · Computer Science 2025-05-15 Zizhao Hu , Mohammad Rostami , Jesse Thomason

While hundreds of artificial intelligence (AI) algorithms are now approved or cleared by the US Food and Drugs Administration (FDA), many studies have shown inconsistent generalization or latent bias, particularly for underrepresented…

Computer Vision and Pattern Recognition · Computer Science 2023-08-25 Luke W. Sagers , James A. Diao , Luke Melas-Kyriazi , Matthew Groh , Pranav Rajpurkar , Adewole S. Adamson , Veronica Rotemberg , Roxana Daneshjou , Arjun K. Manrai

Foundation models in digital pathology use massive datasets to learn useful compact feature representations of complex histology images. However, there is limited transparency into what drives the correlation between dataset size and…

The advent of accessible Generative AI tools enables anyone to create and spread synthetic images on social media, often with the intention to mislead, thus posing a significant threat to online information integrity. Most existing…

Computer Vision and Pattern Recognition · Computer Science 2025-06-16 Efthymia Amarantidou , Christos Koutlis , Symeon Papadopoulos , Panagiotis C. Petrantonakis

Open-source pre-trained models hold great potential for diverse applications, but their utility declines when their training data is unavailable. Data-Free Image Synthesis (DFIS) aims to generate images that approximate the learned data…

Computer Vision and Pattern Recognition · Computer Science 2025-06-19 Yujin Kim , Hyunsoo Kim , Hyunwoo J. Kim , Suhyun Kim

As synthetic data proliferates across the Internet, it is often reused to train successive generations of generative models. This creates a ``self-consuming loop" that can lead to training instability or \textit{model collapse}. Common…

Machine Learning · Computer Science 2025-11-18 Zhongteng Cai , Yaxuan Wang , Yang Liu , Xueru Zhang

Deep learning model effectiveness in classification tasks is often challenged by the quality and quantity of training data whenever they are affected by strong spurious correlations between specific attributes and target labels. This…

Artificial intelligence (AI) is increasingly used in every stage of drug development. Continuing breakthroughs in AI-based methods for drug discovery require the creation, improvement, and refinement of drug discovery data. We posit a new…

Machine Learning · Computer Science 2024-05-08 Bing Hu , Ashish Saragadam , Anita Layton , Helen Chen

The increasing reliance on large-scale datasets in machine learning poses significant privacy and ethical challenges, particularly in sensitive domains such as face recognition. Synthetic data generation offers a promising alternative;…

Computer Vision and Pattern Recognition · Computer Science 2025-10-27 Parsa Rahimi , Damien Teney , Sebastien Marcel

Simulation is increasingly being used for generating large labelled datasets in many machine learning problems. Recent methods have focused on adjusting simulator parameters with the goal of maximising accuracy on a validation task, usually…

Computer Vision and Pattern Recognition · Computer Science 2020-08-20 Harkirat Singh Behl , Atılım Güneş Baydin , Ran Gal , Philip H. S. Torr , Vibhav Vineet

Low-quality or scarce data has posed significant challenges for training deep neural networks in practice. While classical data augmentation cannot contribute very different new data, diffusion models opens up a new door to build…

Computer Vision and Pattern Recognition · Computer Science 2025-09-29 Yijun Liang , Shweta Bhardwaj , Tianyi Zhou

Synthetic data generation is an appealing tool for augmenting and enriching datasets, playing a crucial role in advancing artificial intelligence (AI) and machine learning (ML). Not only does synthetic data help build robust AI/ML datasets…

Systems and Control · Electrical Eng. & Systems 2026-03-20 José Pulido , Francesc Wilhelmi , Sergio Fortes , Alfonso Fernández-Durán , Lorenzo Galati Giordano , Raquel Barco

While synthetic tabular data generation using Deep Generative Models (DGMs) offers a compelling solution to data scarcity and privacy concerns, their effectiveness relies on the availability of substantial training data, often lacking in…

Machine Learning · Computer Science 2025-08-01 Patricia A. Apellániz , Ana Jiménez , Borja Arroyo Galende , Juan Parras , Santiago Zazo

Predictive maintenance has been used to optimize system repairs in the industrial, medical, and financial domains. This technique relies on the consistent ability to detect and predict anomalies in critical systems. AI models have been…

Training models on synthetic data has emerged as an increasingly important strategy for improving the performance of generative AI. This approach is particularly helpful for large multimodal models (LMMs) due to the relative scarcity of…

Artificial Intelligence · Computer Science 2026-01-13 Gabriela Ben Melech Stan , Estelle Aflalo , Avinash Madasu , Vasudev Lal , Phillip Howard

Synthetic samples from diffusion models are promising for leveraging in training discriminative models as replications of real training datasets. However, we found that the synthetic datasets degrade classification performance over real…

Artificial Intelligence · Computer Science 2023-11-23 Shin'ya Yamaguchi , Takuma Fukuda
‹ Prev 1 2 3 10 Next ›