English
Related papers

Related papers: Diffusion Policy Policy Optimization

200 papers

Recent studies have shown the great potential of diffusion models in improving reinforcement learning (RL) by modeling complex policies, expressing a high degree of multi-modality, and efficiently handling high-dimensional continuous…

Robotics · Computer Science 2025-05-14 Huiyun Jiang , Zhuang Yang

Diffusion policies, widely adopted in decision-making scenarios such as robotics, gaming and autonomous driving, are capable of learning diverse skills from demonstration data due to their high representation power. However, the sub-optimal…

Machine Learning · Computer Science 2025-09-30 Ningyuan Yang , Jiaxuan Gao , Feng Gao , Yi Wu , Chao Yu

Diffusion-based policies have gained growing popularity in solving a wide range of decision-making tasks due to their superior expressiveness and controllable generation during inference. However, effectively training large diffusion…

Diffusion models are a class of flexible generative models trained with an approximation to the log-likelihood objective. However, most use cases of diffusion models are not concerned with likelihoods, but instead with downstream objectives…

Machine Learning · Computer Science 2024-01-08 Kevin Black , Michael Janner , Yilun Du , Ilya Kostrikov , Sergey Levine

Flow-based generative models, including diffusion models, excel at modeling continuous distributions in high-dimensional spaces. In this work, we introduce Flow Policy Optimization (FPO), a simple on-policy reinforcement learning algorithm…

Machine Learning · Computer Science 2025-08-04 David McAllister , Songwei Ge , Brent Yi , Chung Min Kim , Ethan Weber , Hongsuk Choi , Haiwen Feng , Angjoo Kanazawa

Diffusion policies excel at robotic manipulation by naturally modeling multimodal action distributions in high-dimensional spaces. Nevertheless, diffusion policies suffer from diffusion representation collapse: semantically similar…

Artificial Intelligence · Computer Science 2026-04-23 Guowei Zou , Weibing Li , Hejun Wu , Yukun Qian , Yuhang Wang , Haitao Wang

This work studies reinforcement learning (RL) in the context of multi-period supply chains subject to constraints, e.g., on production and inventory. We introduce Distributional Constrained Policy Optimization (DCPO), a novel approach for…

Machine Learning · Computer Science 2023-02-06 Jaime Sabal Bermúdez , Antonio del Rio Chanona , Calvin Tsay

Diffusion large language models (dLLMs) are promising alternatives to autoregressive large language models (AR-LLMs), as they potentially allow higher inference throughput. Reinforcement learning (RL) is a crucial component for dLLMs to…

Machine Learning · Computer Science 2026-02-24 Yuchen Zhu , Wei Guo , Jaemoo Choi , Petr Molodyk , Bo Yuan , Molei Tao , Yongxin Chen

Recent research has made significant progress in optimizing diffusion models for downstream objectives, which is an important pursuit in fields such as graph generation for drug design. However, directly applying these models to graph…

Machine Learning · Computer Science 2024-10-28 Yijing Liu , Chao Du , Tianyu Pang , Chongxuan Li , Min Lin , Wei Chen

Diffusion models have garnered widespread attention in Reinforcement Learning (RL) for their powerful expressiveness and multimodality. It has been verified that utilizing diffusion policies can significantly improve the performance of RL…

Machine Learning · Computer Science 2024-12-17 Shutong Ding , Ke Hu , Zhenhao Zhang , Kan Ren , Weinan Zhang , Jingyi Yu , Jingya Wang , Ye Shi

The policy gradient method enjoys the simplicity of the objective where the agent optimizes the cumulative reward directly. Moreover, in the continuous action domain, parameterized distribution of action distribution allows easy control of…

Machine Learning · Computer Science 2022-12-16 Md Masudur Rahman , Yexiang Xue

While reinforcement learning methods such as Group Relative Preference Optimization (GRPO) have significantly enhanced Large Language Models, adapting them to diffusion models remains challenging. In particular, GRPO demands a stochastic…

Machine Learning · Computer Science 2025-10-10 Yihong Luo , Tianyang Hu , Jing Tang

Reinforcement learning (RL) has become a cornerstone for fine-tuning Large Language Models (LLMs), with Proximal Policy Optimization (PPO) serving as the de facto standard algorithm. Despite its ubiquity, we argue that the core ratio…

Machine Learning · Computer Science 2026-05-27 Penghui Qi , Xiangxin Zhou , Zichen Liu , Tianyu Pang , Chao Du , Min Lin , Wee Sun Lee

Popular reinforcement learning (RL) algorithms tend to produce a unimodal policy distribution, which weakens the expressiveness of complicated policy and decays the ability of exploration. The diffusion probability model is powerful to…

Machine Learning · Computer Science 2023-05-23 Long Yang , Zhixiong Huang , Fenghao Lei , Yucun Zhong , Yiming Yang , Cong Fang , Shiting Wen , Binbin Zhou , Zhouchen Lin

Reinforcement learning (RL) has been extensively employed in a wide range of decision-making problems, such as games and robotics. Recently, diffusion policies have shown strong potential in modeling multi-modal behaviors, enabling more…

Machine Learning · Computer Science 2026-03-06 Ben Liu , Shunpeng Yang , Hua Chen

Recent advances in reinforcement learning (RL) have demonstrated the powerful exploration capabilities and multimodality of generative diffusion-based policies. While substantial progress has been made in offline RL and off-policy RL…

Machine Learning · Computer Science 2026-01-23 Shutong Ding , Ke Hu , Shan Zhong , Haoyang Luo , Weinan Zhang , Jingya Wang , Jun Wang , Ye Shi

Reinforcement learning (RL) struggles to scale to large, combinatorial action spaces common in many real-world problems. This paper introduces a novel framework for training discrete diffusion models as highly effective policies in these…

Machine Learning · Computer Science 2026-05-21 Haitong Ma , Ofir Nabati , Aviv Rosenberg , Bo Dai , Oran Lang , Craig Boutilier , Na Li , Shie Mannor , Lior Shani , Guy Tenneholtz

Direct preference optimization (DPO) methods have shown strong potential in aligning text-to-image diffusion models with human preferences by training on paired comparisons. These methods improve training stability by avoiding the REINFORCE…

Computer Vision and Pattern Recognition · Computer Science 2025-10-22 Yi-Lun Wu , Bo-Kai Ruan , Chiang Tseng , Hong-Han Shuai

Latent diffusion models are the state-of-the-art for synthetic image generation. To align these models with human preferences, training the models using reinforcement learning on human feedback is crucial. Black et. al 2024 introduced…

Machine Learning · Computer Science 2024-04-09 Mo Kordzanganeh , Danial Keshvary , Nariman Arian

Proximal Policy Optimization (PPO) is widely used in continuous control due to its robustness and stable training, yet it remains sample-inefficient in tasks with expensive interactions and high-dimensional action spaces. This paper…

Machine Learning · Computer Science 2025-12-16 Tianci Gao , Konstantin A. Neusypin , Dmitry D. Dmitriev , Bo Yang , Shengren Rao
‹ Prev 1 2 3 10 Next ›