Related papers: Behavior Proximal Policy Optimization

POPO: Pessimistic Offline Policy Optimization

Offline reinforcement learning (RL), also known as batch RL, aims to optimize policy from a large pre-recorded dataset without interaction with the environment. This setting offers the promise of utilizing diverse, pre-collected datasets to…

Machine Learning · Computer Science 2021-01-05 Qiang He , Xinwen Hou

Behavior Preference Regression for Offline Reinforcement Learning

Offline reinforcement learning (RL) methods aim to learn optimal policies with access only to trajectories in a fixed dataset. Policy constraint methods formulate policy learning as an optimization problem that balances maximizing reward…

Machine Learning · Computer Science 2025-03-04 Padmanaba Srinivasan , William Knottenbelt

Bounded Ratio Reinforcement Learning

Proximal Policy Optimization (PPO) has become the predominant algorithm for on-policy reinforcement learning due to its scalability and empirical robustness across domains. However, there is a significant disconnect between the underlying…

Machine Learning · Computer Science 2026-05-01 Yunke Ao , Le Chen , Bruce D. Lee , Assefa S. Wahd , Aline Czarnobai , Philipp Fürnstahl , Bernhard Schölkopf , Andreas Krause

Iteratively Refined Behavior Regularization for Offline Reinforcement Learning

One of the fundamental challenges for offline reinforcement learning (RL) is ensuring robustness to data distribution. Whether the data originates from a near-optimal policy or not, we anticipate that an algorithm should demonstrate its…

Machine Learning · Computer Science 2023-10-18 Xiaohan Hu , Yi Ma , Chenjun Xiao , Yan Zheng , Jianye Hao

Bayesian Conservative Policy Optimization (BCPO): A Novel Uncertainty-Calibrated Offline Reinforcement Learning with Credible Lower Bounds

Offline reinforcement learning (RL) aims to learn decision policies from a fixed batch of logged transitions, without additional environment interaction. Despite remarkable empirical progress, offline RL remains fragile under distribution…

Methodology · Statistics 2026-03-16 Debashis Chatterjee

MOPO: Model-based Offline Policy Optimization

Offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data. This problem setting offers the promise of utilizing such datasets to acquire policies without any…

Machine Learning · Computer Science 2020-11-24 Tianhe Yu , Garrett Thomas , Lantao Yu , Stefano Ermon , James Zou , Sergey Levine , Chelsea Finn , Tengyu Ma

Behavior-Regularized Diffusion Policy Optimization for Offline Reinforcement Learning

Behavior regularization, which constrains the policy to stay close to some behavior policy, is widely used in offline reinforcement learning (RL) to manage the risk of hazardous exploitation of unseen actions. Nevertheless, existing…

Machine Learning · Computer Science 2025-05-30 Chen-Xiao Gao , Chenyang Wu , Mingjun Cao , Chenjun Xiao , Yang Yu , Zongzhang Zhang

Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning

In this paper, we study offline preference-based reinforcement learning (PbRL), where learning is based on pre-collected preference feedback over pairs of trajectories. While offline PbRL has demonstrated remarkable empirical success,…

Machine Learning · Computer Science 2025-06-04 Hyungkyu Kang , Min-hwan Oh

Evaluation-Time Policy Switching for Offline Reinforcement Learning

Offline reinforcement learning (RL) looks at learning how to optimally solve tasks using a fixed dataset of interactions from the environment. Many off-policy algorithms developed for online learning struggle in the offline setting as they…

Machine Learning · Computer Science 2025-03-18 Natinael Solomon Neggatu , Jeremie Houssineau , Giovanni Montana

Self-Supervised On-Policy Reinforcement Learning via Contrastive Proximal Policy Optimisation

Contrastive reinforcement learning (CRL) learns goal-conditioned Q-values through a contrastive objective over state-action and goal representations, removing the need for hand-crafted reward functions. Despite impressive success in…

Machine Learning · Computer Science 2026-05-14 Asim Osman , Sasha Abramowitz , Mark Bergh , Ulrich Armel Mbou Sob , Ruan John de Kock , Omayma Mahjoub , Oussama Hidaoui , Noah De Nicola , Arnol Manuel Fokam , Felix Chalumeau , Daniel Rajaonarivonivelomanantsoa , Siddarth Singh , Refiloe Shabe , Juan Claude Formanek , Simon Verster Du Toit , Arnu Pretorius

Transductive Off-policy Proximal Policy Optimization

Proximal Policy Optimization (PPO) is a popular model-free reinforcement learning algorithm, esteemed for its simplicity and efficacy. However, due to its inherent on-policy nature, its proficiency in harnessing data from disparate policies…

Machine Learning · Computer Science 2024-06-07 Yaozhong Gan , Renye Yan , Xiaoyang Tan , Zhe Wu , Junliang Xing

Offline Reinforcement Learning with Closed-Form Policy Improvement Operators

Behavior constrained policy optimization has been demonstrated to be a successful paradigm for tackling Offline Reinforcement Learning. By exploiting historical transitions, a policy is trained to maximize a learned value function while…

Machine Learning · Computer Science 2023-07-25 Jiachen Li , Edwin Zhang , Ming Yin , Qinxun Bai , Yu-Xiang Wang , William Yang Wang

Truly Proximal Policy Optimization

Proximal policy optimization (PPO) is one of the most successful deep reinforcement-learning methods, achieving state-of-the-art performance across a wide range of challenging tasks. However, its optimization behavior is still far from…

Machine Learning · Computer Science 2020-01-15 Yuhui Wang , Hao He , Chao Wen , Xiaoyang Tan

Stable Policy Optimization via Off-Policy Divergence Regularization

Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) are among the most successful policy gradient approaches in deep reinforcement learning (RL). While these methods achieve state-of-the-art performance across a…

Machine Learning · Computer Science 2020-06-22 Ahmed Touati , Amy Zhang , Joelle Pineau , Pascal Vincent

Behavior Prior Representation learning for Offline Reinforcement Learning

Offline reinforcement learning (RL) struggles in environments with rich and noisy inputs, where the agent only has access to a fixed dataset without environment interactions. Past works have proposed common workarounds based on the…

Machine Learning · Computer Science 2023-03-01 Hongyu Zang , Xin Li , Jie Yu , Chen Liu , Riashat Islam , Remi Tachet Des Combes , Romain Laroche

Rethinking the Trust Region in LLM Reinforcement Learning

Reinforcement learning (RL) has become a cornerstone for fine-tuning Large Language Models (LLMs), with Proximal Policy Optimization (PPO) serving as the de facto standard algorithm. Despite its ubiquity, we argue that the core ratio…

Machine Learning · Computer Science 2026-05-27 Penghui Qi , Xiangxin Zhou , Zichen Liu , Tianyu Pang , Chao Du , Min Lin , Wee Sun Lee

Federated Offline Policy Optimization with Dual Regularization

Federated Reinforcement Learning (FRL) has been deemed as a promising solution for intelligent decision-making in the era of Artificial Internet of Things. However, existing FRL approaches often entail repeated interactions with the…

Machine Learning · Computer Science 2024-05-30 Sheng Yue , Zerui Qin , Xingyuan Hua , Yongheng Deng , Ju Ren

COOPO: Cyclic Offline-Online Policy Optimization Algorithm

Offline reinforcement learning struggles with distributional shift and constrained performance due to static dataset limitations, while online RL demands prohibitive environment interactions. The recent advent of hybrid offline-to-online…

Machine Learning · Computer Science 2026-05-19 Qisai Liu , Zhanhong Jiang , Joshua Russell Waite , Aditya Balu , Cody Fleming , Soumik Sarkar

Model-based Offline Reinforcement Learning with Local Misspecification

We present a model-based offline reinforcement learning policy performance lower bound that explicitly captures dynamics model misspecification and distribution mismatch and we propose an empirical algorithm for optimal offline policy…

Machine Learning · Computer Science 2023-01-30 Kefan Dong , Yannis Flet-Berliac , Allen Nie , Emma Brunskill

Conservative Bayesian Model-Based Value Expansion for Offline Policy Optimization

Offline reinforcement learning (RL) addresses the problem of learning a performant policy from a fixed batch of data collected by following some behavior policy. Model-based approaches are particularly appealing in the offline setting since…

Machine Learning · Computer Science 2023-03-06 Jihwan Jeong , Xiaoyu Wang , Michael Gimelfarb , Hyunwoo Kim , Baher Abdulhai , Scott Sanner