Related papers: Behavior-Regularized Diffusion Policy Optimization…

Behavior Proximal Policy Optimization

Offline reinforcement learning (RL) is a challenging setting where existing off-policy actor-critic methods perform poorly due to the overestimation of out-of-distribution state-action pairs. Thus, various additional augmentations are…

Machine Learning · Computer Science 2023-02-23 Zifeng Zhuang , Kun Lei , Jinxin Liu , Donglin Wang , Yilang Guo

Score Regularized Policy Optimization through Diffusion Behavior

Recent developments in offline reinforcement learning have uncovered the immense potential of diffusion modeling, which excels at representing heterogeneous behavior policies. However, sampling from diffusion policies is considerably slow…

Machine Learning · Computer Science 2024-03-18 Huayu Chen , Cheng Lu , Zhengyi Wang , Hang Su , Jun Zhu

Iteratively Refined Behavior Regularization for Offline Reinforcement Learning

One of the fundamental challenges for offline reinforcement learning (RL) is ensuring robustness to data distribution. Whether the data originates from a near-optimal policy or not, we anticipate that an algorithm should demonstrate its…

Machine Learning · Computer Science 2023-10-18 Xiaohan Hu , Yi Ma , Chenjun Xiao , Yan Zheng , Jianye Hao

Preferred-Action-Optimized Diffusion Policies for Offline Reinforcement Learning

Offline reinforcement learning (RL) aims to learn optimal policies from previously collected datasets. Recently, due to their powerful representational capabilities, diffusion models have shown significant potential as policy models for…

Machine Learning · Computer Science 2024-05-30 Tianle Zhang , Jiayi Guan , Lin Zhao , Yihang Li , Dongjiang Li , Zecui Zeng , Lei Sun , Yue Chen , Xuelong Wei , Lusong Li , Xiaodong He

Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning

Offline reinforcement learning (RL), which aims to learn an optimal policy using a previously collected static dataset, is an important paradigm of RL. Standard RL methods often perform poorly in this regime due to the function…

Machine Learning · Computer Science 2023-08-29 Zhendong Wang , Jonathan J Hunt , Mingyuan Zhou

Diffusion Policy Policy Optimization

We introduce Diffusion Policy Policy Optimization, DPPO, an algorithmic framework including best practices for fine-tuning diffusion-based policies (e.g. Diffusion Policy) in continuous control and robot learning tasks using the policy…

Robotics · Computer Science 2024-12-11 Allen Z. Ren , Justin Lidard , Lars L. Ankile , Anthony Simeonov , Pulkit Agrawal , Anirudha Majumdar , Benjamin Burchfiel , Hongkai Dai , Max Simchowitz

Diffusion Policy through Conditional Proximal Policy Optimization

Reinforcement learning (RL) has been extensively employed in a wide range of decision-making problems, such as games and robotics. Recently, diffusion policies have shown strong potential in modeling multi-modal behaviors, enabling more…

Machine Learning · Computer Science 2026-03-06 Ben Liu , Shunpeng Yang , Hua Chen

BRAC+: Improved Behavior Regularized Actor Critic for Offline Reinforcement Learning

Online interactions with the environment to collect data samples for training a Reinforcement Learning (RL) agent is not always feasible due to economic and safety concerns. The goal of Offline Reinforcement Learning is to address this…

Machine Learning · Computer Science 2021-10-05 Chi Zhang , Sanmukh Rao Kuppannagari , Viktor K Prasanna

Diffusion Policies with Value-Conditional Optimization for Offline Reinforcement Learning

In offline reinforcement learning, value overestimation caused by out-of-distribution (OOD) actions significantly limits policy performance. Recently, diffusion models have been leveraged for their strong distribution-matching capabilities,…

Machine Learning · Computer Science 2025-11-13 Yunchang Ma , Tenglong Liu , Yixing Lan , Xin Yin , Changxin Zhang , Xinglong Zhang , Xin Xu

Don't Trade Off Safety: Diffusion Regularization for Constrained Offline RL

Constrained reinforcement learning (RL) seeks high-performance policies under safety constraints. We focus on an offline setting where the agent has only a fixed dataset -- common in realistic tasks to prevent unsafe exploration. To address…

Machine Learning · Computer Science 2025-09-08 Junyu Guo , Zhi Zheng , Donghao Ying , Ming Jin , Shangding Gu , Costas Spanos , Javad Lavaei

Federated Offline Policy Optimization with Dual Regularization

Federated Reinforcement Learning (FRL) has been deemed as a promising solution for intelligent decision-making in the era of Artificial Internet of Things. However, existing FRL approaches often entail repeated interactions with the…

Machine Learning · Computer Science 2024-05-30 Sheng Yue , Zerui Qin , Xingyuan Hua , Yongheng Deng , Ju Ren

Diffusion-based Reinforcement Learning via Q-weighted Variational Policy Optimization

Diffusion models have garnered widespread attention in Reinforcement Learning (RL) for their powerful expressiveness and multimodality. It has been verified that utilizing diffusion policies can significantly improve the performance of RL…

Machine Learning · Computer Science 2024-12-17 Shutong Ding , Ke Hu , Zhenhao Zhang , Kan Ren , Weinan Zhang , Jingyi Yu , Jingya Wang , Ye Shi

Forward KL Regularized Preference Optimization for Aligning Diffusion Policies

Diffusion models have achieved remarkable success in sequential decision-making by leveraging the highly expressive model capabilities in policy learning. A central problem for learning diffusion policies is to align the policy output with…

Machine Learning · Computer Science 2024-12-17 Zhao Shan , Chenyou Fan , Shuang Qiu , Jiyuan Shi , Chenjia Bai

DiffPoGAN: Diffusion Policies with Generative Adversarial Networks for Offline Reinforcement Learning

Offline reinforcement learning (RL) can learn optimal policies from pre-collected offline datasets without interacting with the environment, but the sampled actions of the agent cannot often cover the action distribution under a given…

Machine Learning · Computer Science 2024-06-14 Xuemin Hu , Shen Li , Yingfen Xu , Bo Tang , Long Chen

SelfBC: Self Behavior Cloning for Offline Reinforcement Learning

Policy constraint methods in offline reinforcement learning employ additional regularization techniques to constrain the discrepancy between the learned policy and the offline dataset. However, these methods tend to result in overly…

Machine Learning · Computer Science 2024-08-06 Shirong Liu , Chenjia Bai , Zixian Guo , Hao Zhang , Gaurav Sharma , Yang Liu

Efficient Diffusion Policies for Offline Reinforcement Learning

Offline reinforcement learning (RL) aims to learn optimal policies from offline datasets, where the parameterization of policies is crucial but often overlooked. Recently, Diffsuion-QL significantly boosts the performance of offline RL by…

Machine Learning · Computer Science 2023-10-27 Bingyi Kang , Xiao Ma , Chao Du , Tianyu Pang , Shuicheng Yan

Model-Based Offline Meta-Reinforcement Learning with Regularization

Existing offline reinforcement learning (RL) methods face a few major challenges, particularly the distributional shift between the learned policy and the behavior policy. Offline Meta-RL is emerging as a promising approach to address these…

Machine Learning · Computer Science 2022-07-14 Sen Lin , Jialin Wan , Tengyu Xu , Yingbin Liang , Junshan Zhang

Reinforcement Learning with Discrete Diffusion Policies for Combinatorial Action Spaces

Reinforcement learning (RL) struggles to scale to large, combinatorial action spaces common in many real-world problems. This paper introduces a novel framework for training discrete diffusion models as highly effective policies in these…

Machine Learning · Computer Science 2026-05-21 Haitong Ma , Ofir Nabati , Aviv Rosenberg , Bo Dai , Oran Lang , Craig Boutilier , Na Li , Shie Mannor , Lior Shani , Guy Tenneholtz

MOPO: Model-based Offline Policy Optimization

Offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data. This problem setting offers the promise of utilizing such datasets to acquire policies without any…

Machine Learning · Computer Science 2020-11-24 Tianhe Yu , Garrett Thomas , Lantao Yu , Stefano Ermon , James Zou , Sergey Levine , Chelsea Finn , Tengyu Ma

Dichotomous Diffusion Policy Optimization

Diffusion-based policies have gained growing popularity in solving a wide range of decision-making tasks due to their superior expressiveness and controllable generation during inference. However, effectively training large diffusion…

Machine Learning · Computer Science 2026-02-03 Ruiming Liang , Yinan Zheng , Kexin Zheng , Tianyi Tan , Jianxiong Li , Liyuan Mao , Zhihao Wang , Guang Chen , Hangjun Ye , Jingjing Liu , Jinqiao Wang , Xianyuan Zhan