Related papers: Regularized Conditional Diffusion Model for Multi-…

Forward KL Regularized Preference Optimization for Aligning Diffusion Policies

Diffusion models have achieved remarkable success in sequential decision-making by leveraging the highly expressive model capabilities in policy learning. A central problem for learning diffusion policies is to align the policy output with…

Machine Learning · Computer Science 2024-12-17 Zhao Shan , Chenyou Fan , Shuang Qiu , Jiyuan Shi , Chenjia Bai

MetaDiffuser: Diffusion Model as Conditional Planner for Offline Meta-RL

Recently, diffusion model shines as a promising backbone for the sequence modeling paradigm in offline reinforcement learning(RL). However, these works mostly lack the generalization ability across tasks with reward or dynamics change. To…

Machine Learning · Computer Science 2023-06-01 Fei Ni , Jianye Hao , Yao Mu , Yifu Yuan , Yan Zheng , Bin Wang , Zhixuan Liang

Diffusion Blend: Inference-Time Multi-Preference Alignment for Diffusion Models

Reinforcement learning (RL) algorithms have been used recently to align diffusion models with downstream objectives such as aesthetic quality and text-image consistency by fine-tuning them to maximize a single reward function under a fixed…

Artificial Intelligence · Computer Science 2026-03-13 Min Cheng , Fatemeh Doudi , Dileep Kalathil , Mohammad Ghavamzadeh , Panganamala R. Kumar

Is Conditional Generative Modeling all you need for Decision-Making?

Recent improvements in conditional generative modeling have made it possible to generate high-quality images from language descriptions alone. We investigate whether these methods can directly address the problem of sequential…

Machine Learning · Computer Science 2023-07-11 Anurag Ajay , Yilun Du , Abhi Gupta , Joshua Tenenbaum , Tommi Jaakkola , Pulkit Agrawal

Adding Conditional Control to Diffusion Models with Reinforcement Learning

Diffusion models are powerful generative models that allow for precise control over the characteristics of the generated samples. While these diffusion models trained on large datasets have achieved success, there is often a need to…

Machine Learning · Computer Science 2025-02-25 Yulai Zhao , Masatoshi Uehara , Gabriele Scalia , Sunyuan Kung , Tommaso Biancalani , Sergey Levine , Ehsan Hajiramezanali

PC-Diffusion: Aligning Diffusion Models with Human Preferences via Preference Classifier

Diffusion models have achieved remarkable success in conditional image generation, yet their outputs often remain misaligned with human preferences. To address this, recent work has applied Direct Preference Optimization (DPO) to diffusion…

Computer Vision and Pattern Recognition · Computer Science 2025-11-12 Shaomeng Wang , He Wang , Xiaolu Wei , Longquan Dai , Jinhui Tang

Score Regularized Policy Optimization through Diffusion Behavior

Recent developments in offline reinforcement learning have uncovered the immense potential of diffusion modeling, which excels at representing heterogeneous behavior policies. However, sampling from diffusion policies is considerably slow…

Machine Learning · Computer Science 2024-03-18 Huayu Chen , Cheng Lu , Zhengyi Wang , Hang Su , Jun Zhu

Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models

Diffusion models have made substantial advances in image generation, yet models trained on large, unfiltered datasets often yield outputs misaligned with human preferences. Numerous methods have been proposed to fine-tune pre-trained…

Computer Vision and Pattern Recognition · Computer Science 2025-05-19 Fu-Yun Wang , Yunhao Shui , Jingtan Piao , Keqiang Sun , Hongsheng Li

Reinforcement Learning from Diverse Human Preferences

The complexity of designing reward functions has been a major obstacle to the wide application of deep reinforcement learning (RL) techniques. Describing an agent's desired behaviors and properties can be difficult, even for experts. A new…

Machine Learning · Computer Science 2024-05-09 Wanqi Xue , Bo An , Shuicheng Yan , Zhongwen Xu

Data-regularized Reinforcement Learning for Diffusion Models at Scale

Aligning generative diffusion models with human preferences via reinforcement learning (RL) is critical yet challenging. Most existing algorithms are often vulnerable to reward hacking, such as quality degradation, over-stylization, or…

Machine Learning · Computer Science 2025-12-25 Haotian Ye , Kaiwen Zheng , Jiashu Xu , Puheng Li , Huayu Chen , Jiaqi Han , Sheng Liu , Qinsheng Zhang , Hanzi Mao , Zekun Hao , Prithvijit Chattopadhyay , Dinghao Yang , Liang Feng , Maosheng Liao , Junjie Bai , Ming-Yu Liu , James Zou , Stefano Ermon

Towards Controllable Diffusion Models via Reward-Guided Exploration

By formulating data samples' formation as a Markov denoising process, diffusion models achieve state-of-the-art performances in a collection of tasks. Recently, many variants of diffusion models have been proposed to enable controlled…

Machine Learning · Computer Science 2023-04-17 Hengtong Zhang , Tingyang Xu

Reward-Directed Conditional Diffusion: Provable Distribution Estimation and Reward Improvement

We explore the methodology and theory of reward-directed generation via conditional diffusion models. Directed generation aims to generate samples with desired properties as measured by a reward function, which has broad applications in…

Machine Learning · Computer Science 2023-07-17 Hui Yuan , Kaixuan Huang , Chengzhuo Ni , Minshuo Chen , Mengdi Wang

Latent Embedding Adaptation for Human Preference Alignment in Diffusion Planners

This work addresses the challenge of personalizing trajectories generated in automated decision-making systems by introducing a resource-efficient approach that enables rapid adaptation to individual users' preferences. Our method leverages…

Machine Learning · Computer Science 2025-03-25 Wen Zheng Terence Ng , Jianda Chen , Yuan Xu , Tianwei Zhang

Planning with Diffusion for Flexible Behavior Synthesis

Model-based reinforcement learning methods often use learning only for the purpose of estimating an approximate dynamics model, offloading the rest of the decision-making work to classical trajectory optimizers. While conceptually simple,…

Machine Learning · Computer Science 2022-12-22 Michael Janner , Yilun Du , Joshua B. Tenenbaum , Sergey Levine

Alignment and Safety of Diffusion Models via Reinforcement Learning and Reward Modeling: A Survey

Diffusion models have become a central paradigm for image and multimodal generation, yet their deployment raises persistent questions about alignment, safety, preference satisfaction, and robustness to misuse. This survey reviews recent…

Computer Vision and Pattern Recognition · Computer Science 2026-05-19 Preeti Lamba , Kiran Ravish , Ankita Kushwaha , Pawan Kumar

Maximize Your Diffusion: A Study into Reward Maximization and Alignment for Diffusion-based Control

Diffusion-based planning, learning, and control methods present a promising branch of powerful and expressive decision-making solutions. Given the growing interest, such methods have undergone numerous refinements over the past years.…

Machine Learning · Computer Science 2025-02-19 Dom Huh , Prasant Mohapatra

Direct Preference Optimization-Enhanced Multi-Guided Diffusion Model for Traffic Scenario Generation

Diffusion-based models are recognized for their effectiveness in using real-world driving data to generate realistic and diverse traffic scenarios. These models employ guided sampling to incorporate specific traffic preferences and enhance…

Machine Learning · Computer Science 2025-02-19 Seungjun Yu , Kisung Kim , Daejung Kim , Haewook Han , Jinhan Lee

Preferred-Action-Optimized Diffusion Policies for Offline Reinforcement Learning

Offline reinforcement learning (RL) aims to learn optimal policies from previously collected datasets. Recently, due to their powerful representational capabilities, diffusion models have shown significant potential as policy models for…

Machine Learning · Computer Science 2024-05-30 Tianle Zhang , Jiayi Guan , Lin Zhao , Yihang Li , Dongjiang Li , Zecui Zeng , Lei Sun , Yue Chen , Xuelong Wei , Lusong Li , Xiaodong He

Discrete Conditional Diffusion for Reranking in Recommendation

Reranking plays a crucial role in modern multi-stage recommender systems by rearranging the initial ranking list to model interplay between items. Considering the inherent challenges of reranking such as combinatorial searching space, some…

Information Retrieval · Computer Science 2023-08-15 Xiao Lin , Xiaokai Chen , Chenyang Wang , Hantao Shu , Linfeng Song , Biao Li , Peng jiang

A Simple Approach to Unifying Diffusion-based Conditional Generation

Recent progress in image generation has sparked research into controlling these models through condition signals, with various methods addressing specific challenges in conditional generation. Instead of proposing another specialized…

Computer Vision and Pattern Recognition · Computer Science 2025-04-08 Xirui Li , Charles Herrmann , Kelvin C. K. Chan , Yinxiao Li , Deqing Sun , Chao Ma , Ming-Hsuan Yang