Related papers: Training Data Attribution for Diffusion Models

Diffusion Attribution Score: Evaluating Training Data Influence in Diffusion Models

As diffusion models become increasingly popular, the misuse of copyrighted and private images has emerged as a major concern. One promising solution to mitigate this issue is identifying the contribution of specific training samples in…

Machine Learning · Computer Science 2025-03-24 Jinxu Lin , Linwei Tao , Minjing Dong , Chang Xu

Training Data Influence Analysis and Estimation: A Survey

Good models require good training data. For overparameterized deep models, the causal relationship between training data and model predictions is increasingly opaque and poorly understood. Influence analysis partially demystifies training's…

Machine Learning · Computer Science 2024-04-02 Zayd Hammoudeh , Daniel Lowd

Data Attribution for Diffusion Models: Timestep-induced Bias in Influence Estimation

Data attribution methods trace model behavior back to its training dataset, offering an effective approach to better understand ''black-box'' neural networks. While prior research has established quantifiable links between model output and…

Machine Learning · Computer Science 2024-07-30 Tong Xie , Haoyu Li , Andrew Bai , Cho-Jui Hsieh

Nonparametric Data Attribution for Diffusion Models

Data attribution for generative models seeks to quantify the influence of individual training examples on model outputs. Existing methods for diffusion models typically require access to model gradients or retraining, limiting their…

Machine Learning · Computer Science 2025-10-17 Yutian Zhao , Chao Du , Xiaosen Zheng , Tianyu Pang , Min Lin

DMin: Scalable Training Data Influence Estimation for Diffusion Models

Identifying the training data samples that most influence a generated image is a critical task in understanding diffusion models (DMs), yet existing influence estimation methods are constrained to small-scale or LoRA-tuned models due to…

Computer Vision and Pattern Recognition · Computer Science 2026-04-10 Huawei Lin , Yingjie Lao , Weijie Zhao

Influence Functions for Scalable Data Attribution in Diffusion Models

Diffusion models have led to significant advancements in generative modelling. Yet their widespread adoption poses challenges regarding data attribution and interpretability. In this paper, we aim to help address such challenges in…

Machine Learning · Computer Science 2025-05-27 Bruno Mlodozeniec , Runa Eschenhagen , Juhan Bae , Alexander Immer , David Krueger , Richard Turner

Constrained Diffusion Models via Dual Training

Diffusion models have attained prominence for their ability to synthesize a probability distribution for a given dataset via a diffusion process, enabling the generation of new data points with high fidelity. However, diffusion processes…

Machine Learning · Computer Science 2024-11-25 Shervin Khalafi , Dongsheng Ding , Alejandro Ribeiro

Diffusion Models for Time Series Applications: A Survey

Diffusion models, a family of generative models based on deep learning, have become increasingly prominent in cutting-edge machine learning research. With a distinguished performance in generating samples that resemble the observed data,…

Machine Learning · Computer Science 2023-05-02 Lequan Lin , Zhengkun Li , Ruikun Li , Xuliang Li , Junbin Gao

Extracting Training Data from Diffusion Models

Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted significant attention due to their ability to generate high-quality synthetic images. In this work, we show that diffusion models memorize individual…

Cryptography and Security · Computer Science 2023-01-31 Nicholas Carlini , Jamie Hayes , Milad Nasr , Matthew Jagielski , Vikash Sehwag , Florian Tramèr , Borja Balle , Daphne Ippolito , Eric Wallace

The Emergence of Reproducibility and Generalizability in Diffusion Models

In this work, we investigate an intriguing and prevalent phenomenon of diffusion models which we term as "consistent model reproducibility": given the same starting noise input and a deterministic sampler, different diffusion models often…

Machine Learning · Computer Science 2024-06-11 Huijie Zhang , Jinfan Zhou , Yifu Lu , Minzhe Guo , Peng Wang , Liyue Shen , Qing Qu

On the Limitation of Diffusion Models for Synthesizing Training Datasets

Synthetic samples from diffusion models are promising for leveraging in training discriminative models as replications of real training datasets. However, we found that the synthetic datasets degrade classification performance over real…

Artificial Intelligence · Computer Science 2023-11-23 Shin'ya Yamaguchi , Takuma Fukuda

On the Generalization of Diffusion Model

The diffusion probabilistic generative models are widely used to generate high-quality data. Though they can synthetic data that does not exist in the training set, the rationale behind such generalization is still unexplored. In this…

Machine Learning · Computer Science 2023-05-25 Mingyang Yi , Jiacheng Sun , Zhenguo Li

Distributional Training Data Attribution: What do Influence Functions Sample?

Randomness is an unavoidable part of training deep learning models, yet something that traditional training data attribution algorithms fail to rigorously account for. They ignore the fact that, due to stochasticity in the initialisation…

Machine Learning · Computer Science 2025-10-28 Bruno Mlodozeniec , Isaac Reid , Sam Power , David Krueger , Murat Erdogdu , Richard E. Turner , Roger Grosse

Data-Efficient Ensemble Weather Forecasting with Diffusion Models

Although numerical weather forecasting methods have dominated the field, recent advances in deep learning methods, such as diffusion models, have shown promise in ensemble weather forecasting. However, such models are typically…

Machine Learning · Computer Science 2025-09-16 Kevin Valencia , Ziyang Liu , Justin Cui

An Overview of Diffusion Models: Applications, Guided Generation, Statistical Rates and Optimization

Diffusion models, a powerful and universal generative AI technology, have achieved tremendous success in computer vision, audio, reinforcement learning, and computational biology. In these applications, diffusion models provide flexible…

Machine Learning · Computer Science 2024-04-12 Minshuo Chen , Song Mei , Jianqing Fan , Mengdi Wang

Image retrieval outperforms diffusion models on data augmentation

Many approaches have been proposed to use diffusion models to augment training datasets for downstream tasks, such as classification. However, diffusion models are themselves trained on large datasets, often with noisy annotations, and it…

Computer Vision and Pattern Recognition · Computer Science 2023-12-01 Max F. Burg , Florian Wenzel , Dominik Zietlow , Max Horn , Osama Makansi , Francesco Locatello , Chris Russell

Bootstrapping Diffusion: Diffusion Model Training Leveraging Partial and Corrupted Data

Training diffusion models requires large datasets. However, acquiring large volumes of high-quality data can be challenging, for example, collecting large numbers of high-resolution images and long videos. On the other hand, there are many…

Computer Vision and Pattern Recognition · Computer Science 2025-05-20 Xudong Ma

Intriguing Properties of Data Attribution on Diffusion Models

Data attribution seeks to trace model outputs back to training data. With the recent development of diffusion models, data attribution has become a desired module to properly assign valuations for high-quality or copyrighted training…

Machine Learning · Computer Science 2024-03-18 Xiaosen Zheng , Tianyu Pang , Chao Du , Jing Jiang , Min Lin

Zero-Shot Uncertainty Quantification using Diffusion Probabilistic Models

The success of diffusion probabilistic models in generative tasks, such as text-to-image generation, has motivated the exploration of their application to regression problems commonly encountered in scientific computing and various other…

Machine Learning · Computer Science 2024-08-12 Dule Shu , Amir Barati Farimani

Revisiting Data Attribution for Influence Functions

The goal of data attribution is to trace the model's predictions through the learning algorithm and back to its training data. thereby identifying the most influential training samples and understanding how the model's behavior leads to…

Machine Learning · Computer Science 2025-08-12 Hongbo Zhu , Angelo Cangelosi