Related papers: Training Data Attribution for Diffusion Models
As diffusion models become increasingly popular, the misuse of copyrighted and private images has emerged as a major concern. One promising solution to mitigate this issue is identifying the contribution of specific training samples in…
Good models require good training data. For overparameterized deep models, the causal relationship between training data and model predictions is increasingly opaque and poorly understood. Influence analysis partially demystifies training's…
Data attribution methods trace model behavior back to its training dataset, offering an effective approach to better understand ''black-box'' neural networks. While prior research has established quantifiable links between model output and…
Data attribution for generative models seeks to quantify the influence of individual training examples on model outputs. Existing methods for diffusion models typically require access to model gradients or retraining, limiting their…
Identifying the training data samples that most influence a generated image is a critical task in understanding diffusion models (DMs), yet existing influence estimation methods are constrained to small-scale or LoRA-tuned models due to…
Diffusion models have led to significant advancements in generative modelling. Yet their widespread adoption poses challenges regarding data attribution and interpretability. In this paper, we aim to help address such challenges in…
Diffusion models have attained prominence for their ability to synthesize a probability distribution for a given dataset via a diffusion process, enabling the generation of new data points with high fidelity. However, diffusion processes…
Diffusion models, a family of generative models based on deep learning, have become increasingly prominent in cutting-edge machine learning research. With a distinguished performance in generating samples that resemble the observed data,…
Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted significant attention due to their ability to generate high-quality synthetic images. In this work, we show that diffusion models memorize individual…
In this work, we investigate an intriguing and prevalent phenomenon of diffusion models which we term as "consistent model reproducibility": given the same starting noise input and a deterministic sampler, different diffusion models often…
Synthetic samples from diffusion models are promising for leveraging in training discriminative models as replications of real training datasets. However, we found that the synthetic datasets degrade classification performance over real…
The diffusion probabilistic generative models are widely used to generate high-quality data. Though they can synthetic data that does not exist in the training set, the rationale behind such generalization is still unexplored. In this…
Randomness is an unavoidable part of training deep learning models, yet something that traditional training data attribution algorithms fail to rigorously account for. They ignore the fact that, due to stochasticity in the initialisation…
Although numerical weather forecasting methods have dominated the field, recent advances in deep learning methods, such as diffusion models, have shown promise in ensemble weather forecasting. However, such models are typically…
Diffusion models, a powerful and universal generative AI technology, have achieved tremendous success in computer vision, audio, reinforcement learning, and computational biology. In these applications, diffusion models provide flexible…
Many approaches have been proposed to use diffusion models to augment training datasets for downstream tasks, such as classification. However, diffusion models are themselves trained on large datasets, often with noisy annotations, and it…
Training diffusion models requires large datasets. However, acquiring large volumes of high-quality data can be challenging, for example, collecting large numbers of high-resolution images and long videos. On the other hand, there are many…
Data attribution seeks to trace model outputs back to training data. With the recent development of diffusion models, data attribution has become a desired module to properly assign valuations for high-quality or copyrighted training…
The success of diffusion probabilistic models in generative tasks, such as text-to-image generation, has motivated the exploration of their application to regression problems commonly encountered in scientific computing and various other…
The goal of data attribution is to trace the model's predictions through the learning algorithm and back to its training data. thereby identifying the most influential training samples and understanding how the model's behavior leads to…