Related papers: Diffusion Blend: Inference-Time Multi-Preference A…

Inference-Time Alignment in Diffusion Models with Reward-Guided Generation: Tutorial and Review

This tutorial provides an in-depth guide on inference-time guidance and alignment methods for optimizing downstream reward functions in diffusion models. While diffusion models are renowned for their generative modeling capabilities,…

Artificial Intelligence · Computer Science 2025-01-22 Masatoshi Uehara , Yulai Zhao , Chenyu Wang , Xiner Li , Aviv Regev , Sergey Levine , Tommaso Biancalani

Understanding Reinforcement Learning-Based Fine-Tuning of Diffusion Models: A Tutorial and Review

This tutorial provides a comprehensive survey of methods for fine-tuning diffusion models to optimize downstream reward functions. While diffusion models are widely known to provide excellent generative modeling capability, practical…

Machine Learning · Computer Science 2024-07-19 Masatoshi Uehara , Yulai Zhao , Tommaso Biancalani , Sergey Levine

Step-level Denoising-time Diffusion Alignment with Multiple Objectives

Reinforcement learning (RL) has emerged as a powerful tool for aligning diffusion models with human preferences, typically by optimizing a single reward function under a KL regularization constraint. In practice, however, human preferences…

Machine Learning · Computer Science 2026-04-17 Qi Zhang , Dawei Wang , Shaofeng Zou

Alignment and Safety of Diffusion Models via Reinforcement Learning and Reward Modeling: A Survey

Diffusion models have become a central paradigm for image and multimodal generation, yet their deployment raises persistent questions about alignment, safety, preference satisfaction, and robustness to misuse. This survey reviews recent…

Computer Vision and Pattern Recognition · Computer Science 2026-05-19 Preeti Lamba , Kiran Ravish , Ankita Kushwaha , Pawan Kumar

PrefPaint: Aligning Image Inpainting Diffusion Model with Human Preference

In this paper, we make the first attempt to align diffusion models for image inpainting with human aesthetic standards via a reinforcement learning framework, significantly improving the quality and visual appeal of inpainted images.…

Computer Vision and Pattern Recognition · Computer Science 2024-11-05 Kendong Liu , Zhiyu Zhu , Chuanhao Li , Hui Liu , Huanqiang Zeng , Junhui Hou

Enhancing Diffusion-based Restoration Models via Difficulty-Adaptive Reinforcement Learning with IQA Reward

Reinforcement Learning (RL) has recently been incorporated into diffusion models, e.g., tasks such as text-to-image. However, directly applying existing RL methods to diffusion-based image restoration models is suboptimal, as the objective…

Computer Vision and Pattern Recognition · Computer Science 2025-11-04 Xiaogang Xu , Ruihang Chu , Jian Wang , Kun Zhou , Wenjie Shu , Harry Yang , Ser-Nam Lim , Hao Chen , Liang Lin

Feedback Efficient Online Fine-Tuning of Diffusion Models

Diffusion models excel at modeling complex data distributions, including those of images, proteins, and small molecules. However, in many cases, our goal is to model parts of the distribution that maximize certain properties: for example,…

Machine Learning · Computer Science 2024-07-19 Masatoshi Uehara , Yulai Zhao , Kevin Black , Ehsan Hajiramezanali , Gabriele Scalia , Nathaniel Lee Diamant , Alex M Tseng , Sergey Levine , Tommaso Biancalani

Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards

Diffusion models have achieved remarkable success in text-to-image generation. However, their practical applications are hindered by the misalignment between generated images and corresponding text prompts. To tackle this issue,…

Computer Vision and Pattern Recognition · Computer Science 2025-03-28 Zijing Hu , Fengda Zhang , Long Chen , Kun Kuang , Jiahui Li , Kaifeng Gao , Jun Xiao , Xin Wang , Wenwu Zhu

Inference-Time Alignment Control for Diffusion Models with Reinforcement Learning Guidance

Denoising-based generative models, particularly diffusion and flow matching algorithms, have achieved remarkable success. However, aligning their output distributions with complex downstream objectives, such as human preferences,…

Machine Learning · Computer Science 2025-08-29 Luozhijie Jin , Zijie Qiu , Jie Liu , Zijie Diao , Lifeng Qiao , Ning Ding , Alex Lamb , Xipeng Qiu

Forward KL Regularized Preference Optimization for Aligning Diffusion Policies

Diffusion models have achieved remarkable success in sequential decision-making by leveraging the highly expressive model capabilities in policy learning. A central problem for learning diffusion policies is to align the policy output with…

Machine Learning · Computer Science 2024-12-17 Zhao Shan , Chenyou Fan , Shuang Qiu , Jiyuan Shi , Chenjia Bai

Data-regularized Reinforcement Learning for Diffusion Models at Scale

Aligning generative diffusion models with human preferences via reinforcement learning (RL) is critical yet challenging. Most existing algorithms are often vulnerable to reward hacking, such as quality degradation, over-stylization, or…

Machine Learning · Computer Science 2025-12-25 Haotian Ye , Kaiwen Zheng , Jiashu Xu , Puheng Li , Huayu Chen , Jiaqi Han , Sheng Liu , Qinsheng Zhang , Hanzi Mao , Zekun Hao , Prithvijit Chattopadhyay , Dinghao Yang , Liang Feng , Maosheng Liao , Junjie Bai , Ming-Yu Liu , James Zou , Stefano Ermon

MIRA: Towards Mitigating Reward Hacking in Inference-Time Alignment of T2I Diffusion Models

Diffusion models excel at generating images conditioned on text prompts, but the resulting images often do not satisfy user-specific criteria measured by scalar rewards such as Aesthetic Scores. This alignment typically requires…

Machine Learning · Computer Science 2025-10-03 Kevin Zhai , Utsav Singh , Anirudh Thatipelli , Souradip Chakraborty , Anit Kumar Sahu , Furong Huang , Amrit Singh Bedi , Mubarak Shah

Inference-Time Alignment of Diffusion Models with Direct Noise Optimization

In this work, we focus on the alignment problem of diffusion models with a continuous reward function, which represents specific objectives for downstream tasks, such as increasing darkness or improving the aesthetics of images. The central…

Machine Learning · Computer Science 2024-10-03 Zhiwei Tang , Jiangweizhi Peng , Jiasheng Tang , Mingyi Hong , Fan Wang , Tsung-Hui Chang

DiffPO: Diffusion-styled Preference Optimization for Efficient Inference-Time Alignment of Large Language Models

Inference-time alignment provides an efficient alternative for aligning LLMs with humans. However, these approaches still face challenges, such as limited scalability due to policy-specific value functions and latency during the inference…

Computation and Language · Computer Science 2025-05-27 Ruizhe Chen , Wenhao Chai , Zhifei Yang , Xiaotian Zhang , Joey Tianyi Zhou , Tony Quek , Soujanya Poria , Zuozhu Liu

Inference-Time Alignment of Diffusion Models via Evolutionary Algorithms

Diffusion models are state-of-the-art generative models, yet their samples often fail to satisfy application objectives such as safety constraints or domain-specific validity. Existing techniques for alignment require gradients, internal…

Machine Learning · Computer Science 2025-11-27 Purvish Jajal , Nick John Eliopoulos , Benjamin Shiue-Hal Chou , George K. Thiruvathukal , James C. Davis , Yung-Hsiang Lu

Fine-Tuning Diffusion-Based Recommender Systems via Reinforcement Learning with Reward Function Optimization

Diffusion models recently emerged as a powerful paradigm for recommender systems, offering state-of-the-art performance by modeling the generative process of user-item interactions. However, training such models from scratch is both…

Information Retrieval · Computer Science 2025-11-11 Yu Hou , Hua Li , Ha Young Kim , Won-Yong Shin

Large-scale Reinforcement Learning for Diffusion Models

Text-to-image diffusion models are a class of deep generative models that have demonstrated an impressive capacity for high-quality image generation. However, these models are susceptible to implicit biases that arise from web-scale…

Computer Vision and Pattern Recognition · Computer Science 2024-01-24 Yinan Zhang , Eric Tzeng , Yilun Du , Dmitry Kislyuk

PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models

Reward finetuning has emerged as a promising approach to aligning foundation models with downstream objectives. Remarkable success has been achieved in the language domain by using reinforcement learning (RL) to maximize rewards that…

Machine Learning · Computer Science 2024-03-29 Fei Deng , Qifei Wang , Wei Wei , Matthias Grundmann , Tingbo Hou

MARBLE: Multi-Aspect Reward Balance for Diffusion RL

Reinforcement learning fine-tuning has become the dominant approach for aligning diffusion models with human preferences. However, assessing images is intrinsically a multi-dimensional task, and multiple evaluation criteria need to be…

Computer Vision and Pattern Recognition · Computer Science 2026-05-08 Canyu Zhao , Hao Chen , Yunze Tong , Yu Qiao , Jiacheng Li , Chunhua Shen

Preference Alignment on Diffusion Model: A Comprehensive Survey for Image Generation and Editing

The integration of preference alignment with diffusion models (DMs) has emerged as a transformative approach to enhance image generation and editing capabilities. Although integrating diffusion models with preference alignment strategies…

Computer Vision and Pattern Recognition · Computer Science 2025-02-13 Sihao Wu , Xiaonan Si , Chi Xing , Jianhong Wang , Gaojie Jin , Guangliang Cheng , Lijun Zhang , Xiaowei Huang