Related papers: Diffusion Attribution Score: Evaluating Training D…

Nonparametric Data Attribution for Diffusion Models

Data attribution for generative models seeks to quantify the influence of individual training examples on model outputs. Existing methods for diffusion models typically require access to model gradients or retraining, limiting their…

Machine Learning · Computer Science 2025-10-17 Yutian Zhao , Chao Du , Xiaosen Zheng , Tianyu Pang , Min Lin

Training Data Attribution for Diffusion Models

Diffusion models have become increasingly popular for synthesizing high-quality samples based on training datasets. However, given the oftentimes enormous sizes of the training datasets, it is difficult to assess how training data impact…

Machine Learning · Statistics 2023-06-06 Zheng Dai , David K Gifford

Intriguing Properties of Data Attribution on Diffusion Models

Data attribution seeks to trace model outputs back to training data. With the recent development of diffusion models, data attribution has become a desired module to properly assign valuations for high-quality or copyrighted training…

Machine Learning · Computer Science 2024-03-18 Xiaosen Zheng , Tianyu Pang , Chao Du , Jing Jiang , Min Lin

Data Attribution for Diffusion Models: Timestep-induced Bias in Influence Estimation

Data attribution methods trace model behavior back to its training dataset, offering an effective approach to better understand ''black-box'' neural networks. While prior research has established quantifiable links between model output and…

Machine Learning · Computer Science 2024-07-30 Tong Xie , Haoyu Li , Andrew Bai , Cho-Jui Hsieh

Influence Functions for Scalable Data Attribution in Diffusion Models

Diffusion models have led to significant advancements in generative modelling. Yet their widespread adoption poses challenges regarding data attribution and interpretability. In this paper, we aim to help address such challenges in…

Machine Learning · Computer Science 2025-05-27 Bruno Mlodozeniec , Runa Eschenhagen , Juhan Bae , Alexander Immer , David Krueger , Richard Turner

Fast Data Attribution for Text-to-Image Models

Data attribution for text-to-image models aims to identify the training images that most significantly influenced a generated output. Existing attribution methods involve considerable computational resources for each query, making them…

Computer Vision and Pattern Recognition · Computer Science 2025-11-17 Sheng-Yu Wang , Aaron Hertzmann , Alexei A Efros , Richard Zhang , Jun-Yan Zhu

Exploring Training Data Attribution under Limited Access Constraints

Training data attribution (TDA) plays a critical role in understanding the influence of individual training data points on model predictions. Gradient-based TDA methods, popularized by \textit{influence function} for their superior…

Machine Learning · Computer Science 2025-09-17 Shiyuan Zhang , Junwei Deng , Juhan Bae , Jiaqi Ma

Sensitivity Analysis for Diffusion Models

Training a diffusion model approximates a map from a data distribution $\rho$ to the optimal score function $s_t$ for that distribution. Can we differentiate this map? If we could, then we could predict how the score, and ultimately the…

Machine Learning · Computer Science 2025-09-30 Christopher Scarvelis , Justin Solomon

Revisiting Data Attribution for Influence Functions

The goal of data attribution is to trace the model's predictions through the learning algorithm and back to its training data. thereby identifying the most influential training samples and understanding how the model's behavior leads to…

Machine Learning · Computer Science 2025-08-12 Hongbo Zhu , Angelo Cangelosi

Distributional Statistics Restore Training Data Auditability in One-step Distilled Diffusion Models

The proliferation of diffusion models trained on web-scale, provenance-uncertain image collections has made it essential, yet technically unresolved, to determine whether a model has learned from specific copyrighted data without…

Machine Learning · Computer Science 2026-04-06 Muxing Li , Zesheng Ye , Sharon Li , Andy Song , Guangquan Zhang , Feng Liu

Bootstrapping Diffusion: Diffusion Model Training Leveraging Partial and Corrupted Data

Training diffusion models requires large datasets. However, acquiring large volumes of high-quality data can be challenging, for example, collecting large numbers of high-resolution images and long videos. On the other hand, there are many…

Computer Vision and Pattern Recognition · Computer Science 2025-05-20 Xudong Ma

Score Approximation, Estimation and Distribution Recovery of Diffusion Models on Low-Dimensional Data

Diffusion models achieve state-of-the-art performance in various generation tasks. However, their theoretical foundations fall far behind. This paper studies score approximation, estimation, and distribution recovery of diffusion models,…

Machine Learning · Computer Science 2023-02-15 Minshuo Chen , Kaixuan Huang , Tuo Zhao , Mengdi Wang

Rescaled Influence Functions: Accurate Data Attribution in High Dimension

How does the training data affect a model's behavior? This is the question we seek to answer with data attribution. The leading practical approaches to data attribution are based on influence functions (IF). IFs utilize a first-order Taylor…

Machine Learning · Computer Science 2025-09-11 Ittai Rubinstein , Samuel B. Hopkins

DMin: Scalable Training Data Influence Estimation for Diffusion Models

Identifying the training data samples that most influence a generated image is a critical task in understanding diffusion models (DMs), yet existing influence estimation methods are constrained to small-scale or LoRA-tuned models due to…

Computer Vision and Pattern Recognition · Computer Science 2026-04-10 Huawei Lin , Yingjie Lao , Weijie Zhao

Daunce: Data Attribution through Uncertainty Estimation

Training data attribution (TDA) methods aim to identify which training examples influence a model's predictions on specific test data most. By quantifying these influences, TDA supports critical applications such as data debugging,…

Machine Learning · Computer Science 2025-05-30 Xingyuan Pan , Chenlu Ye , Joseph Melkonian , Jiaqi W. Ma , Tong Zhang

Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models

Diffusion models are powerful, but they require a lot of time and data to train. We propose Patch Diffusion, a generic patch-wise training framework, to significantly reduce the training time costs while improving data efficiency, which…

Computer Vision and Pattern Recognition · Computer Science 2023-10-20 Zhendong Wang , Yifan Jiang , Huangjie Zheng , Peihao Wang , Pengcheng He , Zhangyang Wang , Weizhu Chen , Mingyuan Zhou

Diffusion Tree Sampling: Scalable inference-time alignment of diffusion models

Adapting a pretrained diffusion model to new objectives at inference time remains an open problem in generative modeling. Existing steering methods suffer from inaccurate value estimation, especially at high noise levels, which biases…

Machine Learning · Computer Science 2025-06-27 Vineet Jain , Kusha Sareen , Mohammad Pedramfar , Siamak Ravanbakhsh

Elucidating the Exposure Bias in Diffusion Models

Diffusion models have demonstrated impressive generative capabilities, but their \textit{exposure bias} problem, described as the input mismatch between training and sampling, lacks in-depth exploration. In this paper, we systematically…

Machine Learning · Computer Science 2024-04-12 Mang Ning , Mingxiao Li , Jianlin Su , Albert Ali Salah , Itir Onal Ertugrul

Concept-TRAK: Understanding how diffusion models learn concepts through concept-level attribution

While diffusion models excel at image generation, their growing adoption raises critical concerns about copyright issues and model transparency. Existing attribution methods identify training examples influencing an entire image, but fall…

Computer Vision and Pattern Recognition · Computer Science 2026-03-03 Yonghyun Park , Chieh-Hsin Lai , Satoshi Hayakawa , Yuhta Takida , Naoki Murata , Wei-Hsiang Liao , Woosung Choi , Kin Wai Cheuk , Junghyun Koo , Yuki Mitsufuji

Distributional Training Data Attribution: What do Influence Functions Sample?

Randomness is an unavoidable part of training deep learning models, yet something that traditional training data attribution algorithms fail to rigorously account for. They ignore the fact that, due to stochasticity in the initialisation…

Machine Learning · Computer Science 2025-10-28 Bruno Mlodozeniec , Isaac Reid , Sam Power , David Krueger , Murat Erdogdu , Richard E. Turner , Roger Grosse