Related papers: Mean Flow Policy Optimization

Flow Matching Policy Gradients

Flow-based generative models, including diffusion models, excel at modeling continuous distributions in high-dimensional spaces. In this work, we introduce Flow Policy Optimization (FPO), a simple on-policy reinforcement learning algorithm…

Machine Learning · Computer Science 2025-08-04 David McAllister , Songwei Ge , Brent Yi , Chung Min Kim , Ethan Weber , Hongsuk Choi , Haiwen Feng , Angjoo Kanazawa

One-Step Flow Policy Mirror Descent

Diffusion policies have achieved great success in online reinforcement learning (RL) due to their strong expressive capacity. However, the inference of diffusion policy models relies on a slow iterative sampling process, which limits their…

Machine Learning · Computer Science 2025-10-17 Tianyi Chen , Haitong Ma , Na Li , Kai Wang , Bo Dai

One Step Is Enough: Dispersive MeanFlow Policy Optimization

Real-time robotic control demands fast action generation. However, existing generative policies based on diffusion and flow matching require multi-step sampling, fundamentally limiting deployment in time-critical scenarios. We propose…

Robotics · Computer Science 2026-01-29 Guowei Zou , Haitao Wang , Hejun Wu , Yukun Qian , Yuhang Wang , Weibing Li

Diffusion Policy through Conditional Proximal Policy Optimization

Reinforcement learning (RL) has been extensively employed in a wide range of decision-making problems, such as games and robotics. Recently, diffusion policies have shown strong potential in modeling multi-modal behaviors, enabling more…

Machine Learning · Computer Science 2026-03-06 Ben Liu , Shunpeng Yang , Hua Chen

Boosting Maximum Entropy Reinforcement Learning via One-Step Flow Matching

Diffusion policies are expressive yet incur high inference latency. Flow Matching (FM) enables one-step generation, but integrating it into Maximum Entropy Reinforcement Learning (MaxEnt RL) is challenging: the optimal policy is an…

Machine Learning · Computer Science 2026-02-03 Zeqiao Li , Yijing Wang , Haoyu Wang , Zheng Li , Zhiqiang Zuo

Revisiting Generative Policies: A Simpler Reinforcement Learning Algorithmic Perspective

Generative models, particularly diffusion models, have achieved remarkable success in density estimation for multimodal data, drawing significant interest from the reinforcement learning (RL) community, especially in policy modeling in…

Machine Learning · Computer Science 2024-12-03 Jinouwen Zhang , Rongkun Xue , Yazhe Niu , Yun Chen , Jing Yang , Hongsheng Li , Yu Liu

GenPO: Generative Diffusion Models Meet On-Policy Reinforcement Learning

Recent advances in reinforcement learning (RL) have demonstrated the powerful exploration capabilities and multimodality of generative diffusion-based policies. While substantial progress has been made in offline RL and off-policy RL…

Machine Learning · Computer Science 2026-01-23 Shutong Ding , Ke Hu , Shan Zhong , Haoyang Luo , Weinan Zhang , Jingya Wang , Jun Wang , Ye Shi

Training Diffusion Models with Reinforcement Learning

Diffusion models are a class of flexible generative models trained with an approximation to the log-likelihood objective. However, most use cases of diffusion models are not concerned with likelihoods, but instead with downstream objectives…

Machine Learning · Computer Science 2024-01-08 Kevin Black , Michael Janner , Yilun Du , Ilya Kostrikov , Sergey Levine

Diffusion-based Reinforcement Learning via Q-weighted Variational Policy Optimization

Diffusion models have garnered widespread attention in Reinforcement Learning (RL) for their powerful expressiveness and multimodality. It has been verified that utilizing diffusion policies can significantly improve the performance of RL…

Machine Learning · Computer Science 2024-12-17 Shutong Ding , Ke Hu , Zhenhao Zhang , Kan Ren , Weinan Zhang , Jingyi Yu , Jingya Wang , Ye Shi

FlowRL: A Taxonomy and Modular Framework for Reinforcement Learning with Diffusion Policies

Thanks to their remarkable flexibility, diffusion models and flow models have emerged as promising candidates for policy representation. However, efficient reinforcement learning (RL) upon these policies remains a challenge due to the lack…

Machine Learning · Computer Science 2026-03-31 Chenxiao Gao , Edward Chen , Tianyi Chen , Bo Dai

Diffusion Policy Policy Optimization

We introduce Diffusion Policy Policy Optimization, DPPO, an algorithmic framework including best practices for fine-tuning diffusion-based policies (e.g. Diffusion Policy) in continuous control and robot learning tasks using the policy…

Robotics · Computer Science 2024-12-11 Allen Z. Ren , Justin Lidard , Lars L. Ankile , Anthony Simeonov , Pulkit Agrawal , Anirudha Majumdar , Benjamin Burchfiel , Hongkai Dai , Max Simchowitz

Policy Representation via Diffusion Probability Model for Reinforcement Learning

Popular reinforcement learning (RL) algorithms tend to produce a unimodal policy distribution, which weakens the expressiveness of complicated policy and decays the ability of exploration. The diffusion probability model is powerful to…

Machine Learning · Computer Science 2023-05-23 Long Yang , Zhixiong Huang , Fenghao Lei , Yucun Zhong , Yiming Yang , Cong Fang , Shiting Wen , Binbin Zhou , Zhouchen Lin

DiFFPO: Training Diffusion LLMs to Reason Fast and Furious via Reinforcement Learning

We propose DiFFPO, Diffusion Fast and Furious Policy Optimization, a unified framework for training masked diffusion large language models (dLLMs) to reason not only better (furious), but also faster via reinforcement learning (RL). We…

Machine Learning · Computer Science 2026-01-13 Hanyang Zhao , Dawen Liang , Wenpin Tang , David Yao , Nathan Kallus

Scaling World-Model Reinforcement Learning Through Diffusion Policy Optimization

Model-based reinforcement learning (RL) can be effectively supported at scale through the use of world models. However, in practice, scaling such approaches remains fundamentally limited. A commonly recognized challenge is model bias and…

Machine Learning · Computer Science 2026-05-27 Xiaoyuan Cheng , Wenxuan Yuan , Zhancun Mu , Yuanzhao Zhang , Yiming Yang , Hai Wang , Zhuo Sun , Che Liu

OM2P: Offline Multi-Agent Mean-Flow Policy

Generative models, especially diffusion and flow-based models, have been promising in offline multi-agent reinforcement learning. However, integrating powerful generative models into this framework poses unique challenges. In particular,…

Machine Learning · Computer Science 2026-03-02 Zhuoran Li , Xun Wang , Hai Zhong , Qingxin Xia , Lihua Zhang , Longbo Huang

ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning

We propose ReinFlow, a simple yet effective online reinforcement learning (RL) framework that fine-tunes a family of flow matching policies for continuous robotic control. Derived from rigorous RL theory, ReinFlow injects learnable noise…

Robotics · Computer Science 2026-01-09 Tonghe Zhang , Chao Yu , Sichang Su , Yu Wang

Score-Based One-step MeanFlow Policy Optimization

Diffusion and flow matching have emerged as expressive policy classes in reinforcement learning, but their reliance on multi-step denoising imposes substantial computational overhead at inference time, which is particularly problematic in…

Machine Learning · Computer Science 2026-05-25 Kyungyoon Kim , Donghyeon Ki , Hee-Jun Ahn , Byung-Jun Lee

Analytic Energy-Guided Policy Optimization for Offline Reinforcement Learning

Conditional decision generation with diffusion models has shown powerful competitiveness in reinforcement learning (RL). Recent studies reveal the relation between energy-function-guidance diffusion models and constrained RL problems. The…

Machine Learning · Computer Science 2025-05-06 Jifeng Hu , Sili Huang , Zhejian Yang , Shengchao Hu , Li Shen , Hechang Chen , Lichao Sun , Yi Chang , Dacheng Tao

MDPO: Overcoming the Training-Inference Divide of Masked Diffusion Language Models

Diffusion language models, as a promising alternative to traditional autoregressive (AR) models, enable faster generation and richer conditioning on bidirectional context. However, they suffer from a key discrepancy between training and…

Machine Learning · Computer Science 2025-09-26 Haoyu He , Katrin Renz , Yong Cao , Andreas Geiger

Preferred-Action-Optimized Diffusion Policies for Offline Reinforcement Learning

Offline reinforcement learning (RL) aims to learn optimal policies from previously collected datasets. Recently, due to their powerful representational capabilities, diffusion models have shown significant potential as policy models for…

Machine Learning · Computer Science 2024-05-30 Tianle Zhang , Jiayi Guan , Lin Zhao , Yihang Li , Dongjiang Li , Zecui Zeng , Lei Sun , Yue Chen , Xuelong Wei , Lusong Li , Xiaodong He