Related papers: Dichotomous Diffusion Policy Optimization

Diffusion Policy Policy Optimization

We introduce Diffusion Policy Policy Optimization, DPPO, an algorithmic framework including best practices for fine-tuning diffusion-based policies (e.g. Diffusion Policy) in continuous control and robot learning tasks using the policy…

Robotics · Computer Science 2024-12-11 Allen Z. Ren , Justin Lidard , Lars L. Ankile , Anthony Simeonov , Pulkit Agrawal , Anirudha Majumdar , Benjamin Burchfiel , Hongkai Dai , Max Simchowitz

Reinforcing Diffusion Models by Direct Group Preference Optimization

While reinforcement learning methods such as Group Relative Preference Optimization (GRPO) have significantly enhanced Large Language Models, adapting them to diffusion models remains challenging. In particular, GRPO demands a stochastic…

Machine Learning · Computer Science 2025-10-10 Yihong Luo , Tianyang Hu , Jing Tang

DiFFPO: Training Diffusion LLMs to Reason Fast and Furious via Reinforcement Learning

We propose DiFFPO, Diffusion Fast and Furious Policy Optimization, a unified framework for training masked diffusion large language models (dLLMs) to reason not only better (furious), but also faster via reinforcement learning (RL). We…

Machine Learning · Computer Science 2026-01-13 Hanyang Zhao , Dawen Liang , Wenpin Tang , David Yao , Nathan Kallus

Diffusion Policy through Conditional Proximal Policy Optimization

Reinforcement learning (RL) has been extensively employed in a wide range of decision-making problems, such as games and robotics. Recently, diffusion policies have shown strong potential in modeling multi-modal behaviors, enabling more…

Machine Learning · Computer Science 2026-03-06 Ben Liu , Shunpeng Yang , Hua Chen

Diffusion Policies with Value-Conditional Optimization for Offline Reinforcement Learning

In offline reinforcement learning, value overestimation caused by out-of-distribution (OOD) actions significantly limits policy performance. Recently, diffusion models have been leveraged for their strong distribution-matching capabilities,…

Machine Learning · Computer Science 2025-11-13 Yunchang Ma , Tenglong Liu , Yixing Lan , Xin Yin , Changxin Zhang , Xinglong Zhang , Xin Xu

Diffusion-based Reinforcement Learning via Q-weighted Variational Policy Optimization

Diffusion models have garnered widespread attention in Reinforcement Learning (RL) for their powerful expressiveness and multimodality. It has been verified that utilizing diffusion policies can significantly improve the performance of RL…

Machine Learning · Computer Science 2024-12-17 Shutong Ding , Ke Hu , Zhenhao Zhang , Kan Ren , Weinan Zhang , Jingyi Yu , Jingya Wang , Ye Shi

Adaptive Diffusion Policy Optimization for Robotic Manipulation

Recent studies have shown the great potential of diffusion models in improving reinforcement learning (RL) by modeling complex policies, expressing a high degree of multi-modality, and efficiently handling high-dimensional continuous…

Robotics · Computer Science 2025-05-14 Huiyun Jiang , Zhuang Yang

Training Diffusion Models with Reinforcement Learning

Diffusion models are a class of flexible generative models trained with an approximation to the log-likelihood objective. However, most use cases of diffusion models are not concerned with likelihoods, but instead with downstream objectives…

Machine Learning · Computer Science 2024-01-08 Kevin Black , Michael Janner , Yilun Du , Ilya Kostrikov , Sergey Levine

DiffusionOPD: A Unified Perspective of On-Policy Distillation in Diffusion Models

Reinforcement learning has emerged as a powerful tool for improving diffusion-based text-to-image models, but existing methods are largely limited to single-task optimization. Extending RL to multiple tasks is challenging: joint…

Machine Learning · Computer Science 2026-05-15 Quanhao Li , Junqiu Yu , Kaixun Jiang , Yujie Wei , Zhen Xing , Pandeng Li , Ruihang Chu , Shiwei Zhang , Yu Liu , Zuxuan Wu

Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning

Offline reinforcement learning (RL), which aims to learn an optimal policy using a previously collected static dataset, is an important paradigm of RL. Standard RL methods often perform poorly in this regime due to the function…

Machine Learning · Computer Science 2023-08-29 Zhendong Wang , Jonathan J Hunt , Mingyuan Zhou

Enhancing Reasoning for Diffusion LLMs via Distribution Matching Policy Optimization

Diffusion large language models (dLLMs) are promising alternatives to autoregressive large language models (AR-LLMs), as they potentially allow higher inference throughput. Reinforcement learning (RL) is a crucial component for dLLMs to…

Machine Learning · Computer Science 2026-02-24 Yuchen Zhu , Wei Guo , Jaemoo Choi , Petr Molodyk , Bo Yuan , Molei Tao , Yongxin Chen

Preferred-Action-Optimized Diffusion Policies for Offline Reinforcement Learning

Offline reinforcement learning (RL) aims to learn optimal policies from previously collected datasets. Recently, due to their powerful representational capabilities, diffusion models have shown significant potential as policy models for…

Machine Learning · Computer Science 2024-05-30 Tianle Zhang , Jiayi Guan , Lin Zhao , Yihang Li , Dongjiang Li , Zecui Zeng , Lei Sun , Yue Chen , Xuelong Wei , Lusong Li , Xiaodong He

DIME:Diffusion-Based Maximum Entropy Reinforcement Learning

Maximum entropy reinforcement learning (MaxEnt-RL) has become the standard approach to RL due to its beneficial exploration properties. Traditionally, policies are parameterized using Gaussian distributions, which significantly limits their…

Machine Learning · Computer Science 2025-06-11 Onur Celik , Zechu Li , Denis Blessing , Ge Li , Daniel Palenicek , Jan Peters , Georgia Chalvatzaki , Gerhard Neumann

Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps

Diffusion policies, widely adopted in decision-making scenarios such as robotics, gaming and autonomous driving, are capable of learning diverse skills from demonstration data due to their high representation power. However, the sub-optimal…

Machine Learning · Computer Science 2025-09-30 Ningyuan Yang , Jiaxuan Gao , Feng Gao , Yi Wu , Chao Yu

Improving Reasoning for Diffusion Language Models via Group Diffusion Policy Optimization

Diffusion language models (DLMs) enable parallel, order-agnostic generation with iterative refinement, offering a flexible alternative to autoregressive large language models (LLMs). However, adapting reinforcement learning (RL) fine-tuning…

Machine Learning · Computer Science 2026-02-12 Kevin Rojas , Jiahe Lin , Kashif Rasul , Anderson Schneider , Yuriy Nevmyvaka , Molei Tao , Wei Deng

Reinforcement Learning with Discrete Diffusion Policies for Combinatorial Action Spaces

Reinforcement learning (RL) struggles to scale to large, combinatorial action spaces common in many real-world problems. This paper introduces a novel framework for training discrete diffusion models as highly effective policies in these…

Machine Learning · Computer Science 2026-05-21 Haitong Ma , Ofir Nabati , Aviv Rosenberg , Bo Dai , Oran Lang , Craig Boutilier , Na Li , Shie Mannor , Lior Shani , Guy Tenneholtz

Policy Representation via Diffusion Probability Model for Reinforcement Learning

Popular reinforcement learning (RL) algorithms tend to produce a unimodal policy distribution, which weakens the expressiveness of complicated policy and decays the ability of exploration. The diffusion probability model is powerful to…

Machine Learning · Computer Science 2023-05-23 Long Yang , Zhixiong Huang , Fenghao Lei , Yucun Zhong , Yiming Yang , Cong Fang , Shiting Wen , Binbin Zhou , Zhouchen Lin

Iterative Distillation for Reward-Guided Fine-Tuning of Diffusion Models in Biomolecular Design

We address the problem of fine-tuning diffusion models for reward-guided generation in biomolecular design. While diffusion models have proven highly effective in modeling complex, high-dimensional data distributions, real-world…

Machine Learning · Computer Science 2026-03-03 Xingyu Su , Xiner Li , Masatoshi Uehara , Sunwoo Kim , Yulai Zhao , Gabriele Scalia , Ehsan Hajiramezanali , Tommaso Biancalani , Degui Zhi , Shuiwang Ji

DiffPoGAN: Diffusion Policies with Generative Adversarial Networks for Offline Reinforcement Learning

Offline reinforcement learning (RL) can learn optimal policies from pre-collected offline datasets without interacting with the environment, but the sampled actions of the agent cannot often cover the action distribution under a given…

Machine Learning · Computer Science 2024-06-14 Xuemin Hu , Shen Li , Yingfen Xu , Bo Tang , Long Chen

Advantage Weighted Matching: Aligning RL with Pretraining in Diffusion Models

Reinforcement Learning (RL) has emerged as a central paradigm for advancing Large Language Models (LLMs), where pre-training and RL post-training share the same log-likelihood formulation. In contrast, recent RL approaches for diffusion…

Machine Learning · Computer Science 2025-09-30 Shuchen Xue , Chongjian Ge , Shilong Zhang , Yichen Li , Zhi-Ming Ma