Related papers: Extracting Training Data from Unconditional Diffus…

SIDE: Surrogate Conditional Data Extraction from Diffusion Models

As diffusion probabilistic models (DPMs) become central to Generative AI (GenAI), understanding their memorization behavior is essential for evaluating risks such as data leakage, copyright infringement, and trustworthiness. While prior…

Machine Learning · Computer Science 2025-08-04 Yunhao Chen , Shujie Wang , Difan Zou , Xingjun Ma

On Memorization in Diffusion Models

Due to their capacity to generate novel and high-quality samples, diffusion models have attracted significant research interest in recent years. Notably, the typical training objective of diffusion models, i.e., denoising score matching,…

Machine Learning · Computer Science 2025-02-21 Xiangming Gu , Chao Du , Tianyu Pang , Chongxuan Li , Min Lin , Ye Wang

On the Edge of Memorization in Diffusion Models

When do diffusion models reproduce their training data, and when are they able to generate samples beyond it? A practically relevant theoretical understanding of this interplay between memorization and generalization may significantly…

Machine Learning · Computer Science 2025-08-26 Sam Buchanan , Druv Pai , Yi Ma , Valentin De Bortoli

Investigating Memorization in Video Diffusion Models

Diffusion models, widely used for image and video generation, face a significant limitation: the risk of memorizing and reproducing training data during inference, potentially generating unauthorized copyrighted content. While prior…

Computer Vision and Pattern Recognition · Computer Science 2025-04-28 Chen Chen , Enhuai Liu , Daochang Liu , Mubarak Shah , Chang Xu

Characterizing Memorization in Diffusion Language Models: Generalized Extraction and Sampling Effects

Autoregressive language models (ARMs) have been shown to memorize and occasionally reproduce training data verbatim, raising concerns about privacy and copyright liability. Diffusion language models (DLMs) have recently emerged as a…

Computation and Language · Computer Science 2026-03-04 Xiaoyu Luo , Wenrui Yu , Qiongxiu Li , Johannes Bjerva

Detecting, Explaining, and Mitigating Memorization in Diffusion Models

Recent breakthroughs in diffusion models have exhibited exceptional image-generation capabilities. However, studies show that some outputs are merely replications of training data. Such replications present potential legal challenges for…

Computer Vision and Pattern Recognition · Computer Science 2024-08-01 Yuxin Wen , Yuchen Liu , Chen Chen , Lingjuan Lyu

Unconditional Latent Diffusion Models Memorize Patient Imaging Data: Implications for Openly Sharing Synthetic Data

AI models present a wide range of applications in the field of medicine. However, achieving optimal performance requires access to extensive healthcare data, which is often not readily available. Furthermore, the imperative to preserve…

Image and Video Processing · Electrical Eng. & Systems 2025-01-09 Salman Ul Hassan Dar , Marvin Seyfarth , Isabelle Ayx , Theano Papavassiliu , Stefan O. Schoenberg , Robert Malte Siepmann , Fabian Christopher Laqua , Jannik Kahmann , Norbert Frey , Bettina Baeßler , Sebastian Foersch , Daniel Truhn , Jakob Nikolas Kather , Sandy Engelhardt

Towards Memorization-Free Diffusion Models

Pretrained diffusion models and their outputs are widely accessible due to their exceptional capacity for synthesizing high-quality images and their open-source nature. The users, however, may face litigation risks owing to the models'…

Computer Vision and Pattern Recognition · Computer Science 2024-04-02 Chen Chen , Daochang Liu , Chang Xu

Memorized Images in Diffusion Models share a Subspace that can be Located and Deleted

Large-scale text-to-image diffusion models excel in generating high-quality images from textual inputs, yet concerns arise as research indicates their tendency to memorize and replicate training data, raising We also addressed the issue of…

Computer Vision and Pattern Recognition · Computer Science 2024-06-28 Ruchika Chavhan , Ondrej Bohdal , Yongshuo Zong , Da Li , Timothy Hospedales

A Closer Look on Memorization in Tabular Diffusion Model: A Data-Centric Perspective

Diffusion models have shown strong performance in generating high-quality tabular data, but they carry privacy risks by reproducing exact training samples. While prior work focuses on dataset-level augmentation to reduce memorization,…

Machine Learning · Computer Science 2026-05-26 Zhengyu Fang , Zhimeng Jiang , Huiyuan Chen , Xiaoge Zhang , Kaiyu Tang , Xiao Li , Jing Li

Beyond Memorization: Selective Learning for Copyright-Safe Diffusion Model Training

Memorization in large-scale text-to-image diffusion models poses significant security and intellectual property risks, enabling adversarial attribute extraction and the unauthorized reproduction of sensitive or proprietary features. While…

Machine Learning · Computer Science 2026-01-28 Divya Kothandaraman , Jaclyn Pytlarz

Memory Triggers: Unveiling Memorization in Text-To-Image Generative Models through Word-Level Duplication

Diffusion-based models, such as the Stable Diffusion model, have revolutionized text-to-image synthesis with their ability to produce high-quality, high-resolution images. These advancements have prompted significant progress in image…

Cryptography and Security · Computer Science 2023-12-07 Ali Naseh , Jaechul Roh , Amir Houmansadr

Bigger Isn't Always Memorizing: Early Stopping Overparameterized Diffusion Models

Diffusion probabilistic models have become a cornerstone of modern generative AI, yet the mechanisms underlying their generalization remain poorly understood. In fact, if these models were perfectly minimizing their training loss, they…

Machine Learning · Computer Science 2025-09-03 Alessandro Favero , Antonio Sclocchi , Matthieu Wyart

Language Diffusion Models are Associative Memories Capable of Retrieving Unseen Data

When do language diffusion models memorize their training data, and how to quantitatively assess their true generative regime? We address these questions by showing that Uniform-based Discrete Diffusion Models (UDDMs) fundamentally behave…

Machine Learning · Computer Science 2026-04-30 Bao Pham , Mohammed J. Zaki , Luca Ambrogioni , Dmitry Krotov , Matteo Negri

Distributional Statistics Restore Training Data Auditability in One-step Distilled Diffusion Models

The proliferation of diffusion models trained on web-scale, provenance-uncertain image collections has made it essential, yet technically unresolved, to determine whether a model has learned from specific copyrighted data without…

Machine Learning · Computer Science 2026-04-06 Muxing Li , Zesheng Ye , Sharon Li , Andy Song , Guangquan Zhang , Feng Liu

Demystifying Foreground-Background Memorization in Diffusion Models

Diffusion models (DMs) memorize training images and can reproduce near-duplicates during generation. Current detection methods identify verbatim memorization but fail to capture two critical aspects: quantifying partial memorization…

Computer Vision and Pattern Recognition · Computer Science 2025-08-19 Jimmy Z. Di , Yiwei Lu , Yaoliang Yu , Gautam Kamath , Adam Dziedzic , Franziska Boenisch

Extracting Training Data from Diffusion Language Models via Infilling

Memorization in large language models has been studied almost exclusively through prefix-conditioned extraction, a natural choice for autoregressive models. However, diffusion language models (DLMs) can denoise masked tokens at arbitrary…

Computation and Language · Computer Science 2026-05-26 Yihan Wang , N. Asokan

Extracting Training Data from Diffusion Models

Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted significant attention due to their ability to generate high-quality synthetic images. In this work, we show that diffusion models memorize individual…

Cryptography and Security · Computer Science 2023-01-31 Nicholas Carlini , Jamie Hayes , Milad Nasr , Matthew Jagielski , Vikash Sehwag , Florian Tramèr , Borja Balle , Daphne Ippolito , Eric Wallace

The Emergence of Reproducibility and Generalizability in Diffusion Models

In this work, we investigate an intriguing and prevalent phenomenon of diffusion models which we term as "consistent model reproducibility": given the same starting noise input and a deterministic sampler, different diffusion models often…

Machine Learning · Computer Science 2024-06-11 Huijie Zhang , Jinfan Zhou , Yifu Lu , Minzhe Guo , Peng Wang , Liyue Shen , Qing Qu

Training Data Protection with Compositional Diffusion Models

We introduce Compartmentalized Diffusion Models (CDM), a method to train different diffusion models (or prompts) on distinct data sources and arbitrarily compose them at inference time. The individual models can be trained in isolation, at…

Machine Learning · Computer Science 2024-10-15 Aditya Golatkar , Alessandro Achille , Ashwin Swaminathan , Stefano Soatto