Related papers: Extracting Reward Functions from Diffusion Models

Diffusion Reward: Learning Rewards via Conditional Video Diffusion

Learning rewards from expert videos offers an affordable and effective solution to specify the intended behaviors for reinforcement learning (RL) tasks. In this work, we propose Diffusion Reward, a novel framework that learns rewards from…

Machine Learning · Computer Science 2024-08-12 Tao Huang , Guangqi Jiang , Yanjie Ze , Huazhe Xu

Reward-Directed Conditional Diffusion: Provable Distribution Estimation and Reward Improvement

We explore the methodology and theory of reward-directed generation via conditional diffusion models. Directed generation aims to generate samples with desired properties as measured by a reward function, which has broad applications in…

Machine Learning · Computer Science 2023-07-17 Hui Yuan , Kaixuan Huang , Chengzhuo Ni , Minshuo Chen , Mengdi Wang

Video Diffusion Alignment via Reward Gradients

We have made significant progress towards building foundational video diffusion models. As these models are trained using large-scale unsupervised data, it has become crucial to adapt these models to specific downstream tasks. Adapting…

Computer Vision and Pattern Recognition · Computer Science 2024-07-12 Mihir Prabhudesai , Russell Mendonca , Zheyang Qin , Katerina Fragkiadaki , Deepak Pathak

Non-differentiable Reward Optimization for Diffusion-based Autonomous Motion Planning

Safe and effective motion planning is crucial for autonomous robots. Diffusion models excel at capturing complex agent interactions, a fundamental aspect of decision-making in dynamic environments. Recent studies have successfully applied…

Robotics · Computer Science 2025-07-18 Giwon Lee , Daehee Park , Jaewoo Jeong , Kuk-Jin Yoon

Fine-Tuning of Continuous-Time Diffusion Models as Entropy-Regularized Control

Diffusion models excel at capturing complex data distributions, such as those of natural images and proteins. While diffusion models are trained to represent the distribution in the training dataset, we often are more concerned with other…

Machine Learning · Computer Science 2024-02-29 Masatoshi Uehara , Yulai Zhao , Kevin Black , Ehsan Hajiramezanali , Gabriele Scalia , Nathaniel Lee Diamant , Alex M Tseng , Tommaso Biancalani , Sergey Levine

Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding

Diffusion models excel at capturing the natural design spaces of images, molecules, DNA, RNA, and protein sequences. However, rather than merely generating designs that are natural, we often aim to optimize downstream reward functions while…

Machine Learning · Computer Science 2024-10-28 Xiner Li , Yulai Zhao , Chenyu Wang , Gabriele Scalia , Gokcen Eraslan , Surag Nair , Tommaso Biancalani , Shuiwang Ji , Aviv Regev , Sergey Levine , Masatoshi Uehara

Feedback Efficient Online Fine-Tuning of Diffusion Models

Diffusion models excel at modeling complex data distributions, including those of images, proteins, and small molecules. However, in many cases, our goal is to model parts of the distribution that maximize certain properties: for example,…

Machine Learning · Computer Science 2024-07-19 Masatoshi Uehara , Yulai Zhao , Kevin Black , Ehsan Hajiramezanali , Gabriele Scalia , Nathaniel Lee Diamant , Alex M Tseng , Sergey Levine , Tommaso Biancalani

Goal-Driven Reward by Video Diffusion Models for Reinforcement Learning

Reinforcement Learning (RL) has achieved remarkable success in various domains, yet it often relies on carefully designed programmatic reward functions to guide agent behavior. Designing such reward functions can be challenging and may not…

Machine Learning · Computer Science 2026-04-06 Qi Wang , Mian Wu , Yuyang Zhang , Mingqi Yuan , Wenyao Zhang , Haoxiang You , Yunbo Wang , Xin Jin , Xiaokang Yang , Wenjun Zeng

DRM: Diffusion-based Reward Model With Step-wise Guidance

Current mainstream methods of aligning diffusion models with human preferences typically employ VLM-based reward models. However, these reward models, pre-trained for semantic alignment, struggle to capture the essential perceptual…

Computer Vision and Pattern Recognition · Computer Science 2026-05-26 Jaxon Zhang , Binxin Yang , Hubery Yin , Chen Li , Jing Lyu

Understanding Reinforcement Learning-Based Fine-Tuning of Diffusion Models: A Tutorial and Review

This tutorial provides a comprehensive survey of methods for fine-tuning diffusion models to optimize downstream reward functions. While diffusion models are widely known to provide excellent generative modeling capability, practical…

Machine Learning · Computer Science 2024-07-19 Masatoshi Uehara , Yulai Zhao , Tommaso Biancalani , Sergey Levine

Training-Free Reward-Guided Image Editing via Trajectory Optimal Control

Recent advancements in diffusion and flow-matching models have demonstrated remarkable capabilities in high-fidelity image synthesis. A prominent line of research involves reward-guided guidance, which steers the generation process during…

Computer Vision and Pattern Recognition · Computer Science 2026-05-01 Jinho Chang , Jaemin Kim , Jong Chul Ye

A Reward-Directed Diffusion Framework for Generative Design Optimization

This study presents a generative optimization framework that builds on a fine-tuned diffusion model and reward-directed sampling to generate high-performance engineering designs. The framework adopts a parametric representation of the…

Machine Learning · Computer Science 2025-08-05 Hadi Keramati , Patrick Kirchen , Mohammed Hannan , Rajeev K. Jaiman

Outcome-Driven Reinforcement Learning via Variational Inference

While reinforcement learning algorithms provide automated acquisition of optimal policies, practical application of such methods requires a number of design decisions, such as manually designing reward functions that not only define the…

Machine Learning · Computer Science 2022-12-29 Tim G. J. Rudner , Vitchyr H. Pong , Rowan McAllister , Yarin Gal , Sergey Levine

Diffusion-DRF: Free, Rich, and Differentiable Reward for Video Diffusion Fine-Tuning

Video diffusion alignment has been heavily relied on scalar rewards. These rewards are typically derived from learned reward models in human preference datasets, requiring additional training and extensive collection. Moreover, scalar…

Computer Vision and Pattern Recognition · Computer Science 2026-03-18 Yifan Wang , Yanyu Li , Gordon Guocheng Qian , Sergey Tulyakov , Yun Fu , Anil Kag

Alignment of Diffusion Model and Flow Matching for Text-to-Image Generation

Diffusion models and flow matching have demonstrated remarkable success in text-to-image generation. While many existing alignment methods primarily focus on fine-tuning pre-trained generative models to maximize a given reward function,…

Machine Learning · Statistics 2026-02-03 Yidong Ouyang , Liyan Xie , Hongyuan Zha , Guang Cheng

Directly Fine-Tuning Diffusion Models on Differentiable Rewards

We present Direct Reward Fine-Tuning (DRaFT), a simple and effective method for fine-tuning diffusion models to maximize differentiable reward functions, such as scores from human preference models. We first show that it is possible to…

Computer Vision and Pattern Recognition · Computer Science 2024-06-24 Kevin Clark , Paul Vicol , Kevin Swersky , David J Fleet

Inference-Time Alignment in Diffusion Models with Reward-Guided Generation: Tutorial and Review

This tutorial provides an in-depth guide on inference-time guidance and alignment methods for optimizing downstream reward functions in diffusion models. While diffusion models are renowned for their generative modeling capabilities,…

Artificial Intelligence · Computer Science 2025-01-22 Masatoshi Uehara , Yulai Zhao , Chenyu Wang , Xiner Li , Aviv Regev , Sergey Levine , Tommaso Biancalani

Alignment and Safety of Diffusion Models via Reinforcement Learning and Reward Modeling: A Survey

Diffusion models have become a central paradigm for image and multimodal generation, yet their deployment raises persistent questions about alignment, safety, preference satisfaction, and robustness to misuse. This survey reviews recent…

Computer Vision and Pattern Recognition · Computer Science 2026-05-19 Preeti Lamba , Kiran Ravish , Ankita Kushwaha , Pawan Kumar

Iterative Distillation for Reward-Guided Fine-Tuning of Diffusion Models in Biomolecular Design

We address the problem of fine-tuning diffusion models for reward-guided generation in biomolecular design. While diffusion models have proven highly effective in modeling complex, high-dimensional data distributions, real-world…

Machine Learning · Computer Science 2026-03-03 Xingyu Su , Xiner Li , Masatoshi Uehara , Sunwoo Kim , Yulai Zhao , Gabriele Scalia , Ehsan Hajiramezanali , Tommaso Biancalani , Degui Zhi , Shuiwang Ji

Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation

We consider the problem of imitation learning from a finite set of expert trajectories, without access to reinforcement signals. The classical approach of extracting the expert's reward function via inverse reinforcement learning, followed…

Machine Learning · Computer Science 2019-06-10 Ruohan Wang , Carlo Ciliberto , Pierluigi Amadori , Yiannis Demiris