Related papers: Data-regularized Reinforcement Learning for Diffus…

Distribution Matching Distillation Meets Reinforcement Learning

Distribution Matching Distillation (DMD) facilitates efficient inference by distilling multi-step diffusion models into few-step variants. Concurrently, Reinforcement Learning (RL) has emerged as a vital tool for aligning generative models…

Computer Vision and Pattern Recognition · Computer Science 2026-03-26 Dengyang Jiang , Dongyang Liu , Zanyi Wang , Qilong Wu , Liuzhuozheng Li , Hengzhuang Li , Xin Jin , David Liu , Changsheng Lu , Zhen Li , Bo Zhang , Mengmeng Wang , Steven Hoi , Peng Gao , Harry Yang

Reward Sharpness-Aware Fine-Tuning for Diffusion Models

Reinforcement learning from human feedback (RLHF) has proven effective in aligning large language models with human preferences, inspiring the development of reward-centric diffusion reinforcement learning (RDRL) to achieve similar…

Machine Learning · Computer Science 2026-03-24 Kwanyoung Kim , Byeongsu Sim

Large-scale Reinforcement Learning for Diffusion Models

Text-to-image diffusion models are a class of deep generative models that have demonstrated an impressive capacity for high-quality image generation. However, these models are susceptible to implicit biases that arise from web-scale…

Computer Vision and Pattern Recognition · Computer Science 2024-01-24 Yinan Zhang , Eric Tzeng , Yilun Du , Dmitry Kislyuk

Reinforcement Learning with Discrete Diffusion Policies for Combinatorial Action Spaces

Reinforcement learning (RL) struggles to scale to large, combinatorial action spaces common in many real-world problems. This paper introduces a novel framework for training discrete diffusion models as highly effective policies in these…

Machine Learning · Computer Science 2026-05-21 Haitong Ma , Ofir Nabati , Aviv Rosenberg , Bo Dai , Oran Lang , Craig Boutilier , Na Li , Shie Mannor , Lior Shani , Guy Tenneholtz

Diffusion Reinforcement Learning via Centered Reward Distillation

Diffusion and flow models achieve State-Of-The-Art (SOTA) generative performance, yet many practically important behaviors such as fine-grained prompt fidelity, compositional correctness, and text rendering are weakly specified by score or…

Computer Vision and Pattern Recognition · Computer Science 2026-03-17 Yuanzhi Zhu , Xi Wang , Stéphane Lathuilière , Vicky Kalogeiton

Diffusion-DRF: Free, Rich, and Differentiable Reward for Video Diffusion Fine-Tuning

Video diffusion alignment has been heavily relied on scalar rewards. These rewards are typically derived from learned reward models in human preference datasets, requiring additional training and extensive collection. Moreover, scalar…

Computer Vision and Pattern Recognition · Computer Science 2026-03-18 Yifan Wang , Yanyu Li , Gordon Guocheng Qian , Sergey Tulyakov , Yun Fu , Anil Kag

Diffusion-Augmented Reinforcement Learning for Robust Portfolio Optimization under Stress Scenarios

In the ever-changing and intricate landscape of financial markets, portfolio optimisation remains a formidable challenge for investors and asset managers. Conventional methods often struggle to capture the complex dynamics of market…

Machine Learning · Statistics 2025-10-09 Himanshu Choudhary , Arishi Orra , Manoj Thakur

A Differential Perspective on Distributional Reinforcement Learning

To date, distributional reinforcement learning (distributional RL) methods have exclusively focused on the discounted setting, where an agent aims to optimize a discounted sum of rewards over time. In this work, we extend distributional RL…

Machine Learning · Computer Science 2026-01-14 Juan Sebastian Rojas , Chi-Guhn Lee

Don't Trade Off Safety: Diffusion Regularization for Constrained Offline RL

Constrained reinforcement learning (RL) seeks high-performance policies under safety constraints. We focus on an offline setting where the agent has only a fixed dataset -- common in realistic tasks to prevent unsafe exploration. To address…

Machine Learning · Computer Science 2025-09-08 Junyu Guo , Zhi Zheng , Donghao Ying , Ming Jin , Shangding Gu , Costas Spanos , Javad Lavaei

Open Problems and Modern Solutions for Deep Reinforcement Learning

Deep Reinforcement Learning (DRL) has achieved great success in solving complicated decision-making problems. Despite the successes, DRL is frequently criticized for many reasons, e.g., data inefficient, inflexible and intractable reward…

Machine Learning · Computer Science 2023-02-07 Weiqin Chen

Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards

Diffusion models have achieved remarkable success in text-to-image generation. However, their practical applications are hindered by the misalignment between generated images and corresponding text prompts. To tackle this issue,…

Computer Vision and Pattern Recognition · Computer Science 2025-03-28 Zijing Hu , Fengda Zhang , Long Chen , Kun Kuang , Jiahui Li , Kaifeng Gao , Jun Xiao , Xin Wang , Wenwu Zhu

The $f$-Divergence Reinforcement Learning Framework

The framework of deep reinforcement learning (DRL) provides a powerful and widely applicable mathematical formalization for sequential decision-making. This paper present a novel DRL framework, termed \emph{$f$-Divergence Reinforcement…

Machine Learning · Computer Science 2021-12-15 Chen Gong , Qiang He , Yunpeng Bai , Zhou Yang , Xiaoyu Chen , Xinwen Hou , Xianjie Zhang , Yu Liu , Guoliang Fan

Understanding Reinforcement Learning-Based Fine-Tuning of Diffusion Models: A Tutorial and Review

This tutorial provides a comprehensive survey of methods for fine-tuning diffusion models to optimize downstream reward functions. While diffusion models are widely known to provide excellent generative modeling capability, practical…

Machine Learning · Computer Science 2024-07-19 Masatoshi Uehara , Yulai Zhao , Tommaso Biancalani , Sergey Levine

A Transferable and Automatic Tuning of Deep Reinforcement Learning for Cost Effective Phishing Detection

Many challenging real-world problems require the deployment of ensembles multiple complementary learning models to reach acceptable performance levels. While effective, applying the entire ensemble to every sample is costly and often…

Cryptography and Security · Computer Science 2022-09-20 Orel Lavie , Asaf Shabtai , Gilad Katz

Sample-based Distributional Policy Gradient

Distributional reinforcement learning (DRL) is a recent reinforcement learning framework whose success has been supported by various empirical studies. It relies on the key idea of replacing the expected return with the return distribution,…

Machine Learning · Computer Science 2020-01-09 Rahul Singh , Keuntaek Lee , Yongxin Chen

Adding Conditional Control to Diffusion Models with Reinforcement Learning

Diffusion models are powerful generative models that allow for precise control over the characteristics of the generated samples. While these diffusion models trained on large datasets have achieved success, there is often a need to…

Machine Learning · Computer Science 2025-02-25 Yulai Zhao , Masatoshi Uehara , Gabriele Scalia , Sunyuan Kung , Tommaso Biancalani , Sergey Levine , Ehsan Hajiramezanali

Improve the Training Efficiency of DRL for Wireless Communication Resource Allocation: The Role of Generative Diffusion Models

Dynamic resource allocation in mobile wireless networks involves complex, time-varying optimization problems, motivating the adoption of deep reinforcement learning (DRL). However, most existing works rely on pre-trained policies,…

Machine Learning · Computer Science 2025-02-12 Xinren Zhang , Jiadong Yu

Diffusion Blend: Inference-Time Multi-Preference Alignment for Diffusion Models

Reinforcement learning (RL) algorithms have been used recently to align diffusion models with downstream objectives such as aesthetic quality and text-image consistency by fine-tuning them to maximize a single reward function under a fixed…

Artificial Intelligence · Computer Science 2026-03-13 Min Cheng , Fatemeh Doudi , Dileep Kalathil , Mohammad Ghavamzadeh , Panganamala R. Kumar

Distributional Reward Decomposition for Reinforcement Learning

Many reinforcement learning (RL) tasks have specific properties that can be leveraged to modify existing RL algorithms to adapt to those tasks and further improve performance, and a general class of such properties is the multiple reward…

Machine Learning · Computer Science 2019-11-07 Zichuan Lin , Li Zhao , Derek Yang , Tao Qin , Guangwen Yang , Tie-Yan Liu

$R_\text{dm}$: Re-conceptualizing Distribution Matching as a Reward for Diffusion Distillation

Diffusion models achieve state-of-the-art generative performance but are fundamentally bottlenecked by their slow, iterative sampling process. While diffusion distillation techniques enable high-fidelity, few-step generation, traditional…

Computer Vision and Pattern Recognition · Computer Science 2026-04-01 Linqian Fan , Peiqin Sun , Tiancheng Wen , Shun Lu , Chengru Song