English
Related papers

Related papers: Behavior Proximal Policy Optimization

200 papers

Offline reinforcement learning (RL), also known as batch RL, aims to optimize policy from a large pre-recorded dataset without interaction with the environment. This setting offers the promise of utilizing diverse, pre-collected datasets to…

Machine Learning · Computer Science 2021-01-05 Qiang He , Xinwen Hou

Offline reinforcement learning (RL) methods aim to learn optimal policies with access only to trajectories in a fixed dataset. Policy constraint methods formulate policy learning as an optimization problem that balances maximizing reward…

Machine Learning · Computer Science 2025-03-04 Padmanaba Srinivasan , William Knottenbelt

Proximal Policy Optimization (PPO) has become the predominant algorithm for on-policy reinforcement learning due to its scalability and empirical robustness across domains. However, there is a significant disconnect between the underlying…

One of the fundamental challenges for offline reinforcement learning (RL) is ensuring robustness to data distribution. Whether the data originates from a near-optimal policy or not, we anticipate that an algorithm should demonstrate its…

Machine Learning · Computer Science 2023-10-18 Xiaohan Hu , Yi Ma , Chenjun Xiao , Yan Zheng , Jianye Hao

Offline reinforcement learning (RL) aims to learn decision policies from a fixed batch of logged transitions, without additional environment interaction. Despite remarkable empirical progress, offline RL remains fragile under distribution…

Methodology · Statistics 2026-03-16 Debashis Chatterjee

Offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data. This problem setting offers the promise of utilizing such datasets to acquire policies without any…

Machine Learning · Computer Science 2020-11-24 Tianhe Yu , Garrett Thomas , Lantao Yu , Stefano Ermon , James Zou , Sergey Levine , Chelsea Finn , Tengyu Ma

Behavior regularization, which constrains the policy to stay close to some behavior policy, is widely used in offline reinforcement learning (RL) to manage the risk of hazardous exploitation of unseen actions. Nevertheless, existing…

Machine Learning · Computer Science 2025-05-30 Chen-Xiao Gao , Chenyang Wu , Mingjun Cao , Chenjun Xiao , Yang Yu , Zongzhang Zhang

In this paper, we study offline preference-based reinforcement learning (PbRL), where learning is based on pre-collected preference feedback over pairs of trajectories. While offline PbRL has demonstrated remarkable empirical success,…

Machine Learning · Computer Science 2025-06-04 Hyungkyu Kang , Min-hwan Oh

Offline reinforcement learning (RL) looks at learning how to optimally solve tasks using a fixed dataset of interactions from the environment. Many off-policy algorithms developed for online learning struggle in the offline setting as they…

Machine Learning · Computer Science 2025-03-18 Natinael Solomon Neggatu , Jeremie Houssineau , Giovanni Montana

Contrastive reinforcement learning (CRL) learns goal-conditioned Q-values through a contrastive objective over state-action and goal representations, removing the need for hand-crafted reward functions. Despite impressive success in…

Proximal Policy Optimization (PPO) is a popular model-free reinforcement learning algorithm, esteemed for its simplicity and efficacy. However, due to its inherent on-policy nature, its proficiency in harnessing data from disparate policies…

Machine Learning · Computer Science 2024-06-07 Yaozhong Gan , Renye Yan , Xiaoyang Tan , Zhe Wu , Junliang Xing

Behavior constrained policy optimization has been demonstrated to be a successful paradigm for tackling Offline Reinforcement Learning. By exploiting historical transitions, a policy is trained to maximize a learned value function while…

Machine Learning · Computer Science 2023-07-25 Jiachen Li , Edwin Zhang , Ming Yin , Qinxun Bai , Yu-Xiang Wang , William Yang Wang

Proximal policy optimization (PPO) is one of the most successful deep reinforcement-learning methods, achieving state-of-the-art performance across a wide range of challenging tasks. However, its optimization behavior is still far from…

Machine Learning · Computer Science 2020-01-15 Yuhui Wang , Hao He , Chao Wen , Xiaoyang Tan

Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) are among the most successful policy gradient approaches in deep reinforcement learning (RL). While these methods achieve state-of-the-art performance across a…

Machine Learning · Computer Science 2020-06-22 Ahmed Touati , Amy Zhang , Joelle Pineau , Pascal Vincent

Offline reinforcement learning (RL) struggles in environments with rich and noisy inputs, where the agent only has access to a fixed dataset without environment interactions. Past works have proposed common workarounds based on the…

Machine Learning · Computer Science 2023-03-01 Hongyu Zang , Xin Li , Jie Yu , Chen Liu , Riashat Islam , Remi Tachet Des Combes , Romain Laroche

Reinforcement learning (RL) has become a cornerstone for fine-tuning Large Language Models (LLMs), with Proximal Policy Optimization (PPO) serving as the de facto standard algorithm. Despite its ubiquity, we argue that the core ratio…

Machine Learning · Computer Science 2026-05-27 Penghui Qi , Xiangxin Zhou , Zichen Liu , Tianyu Pang , Chao Du , Min Lin , Wee Sun Lee

Federated Reinforcement Learning (FRL) has been deemed as a promising solution for intelligent decision-making in the era of Artificial Internet of Things. However, existing FRL approaches often entail repeated interactions with the…

Machine Learning · Computer Science 2024-05-30 Sheng Yue , Zerui Qin , Xingyuan Hua , Yongheng Deng , Ju Ren

Offline reinforcement learning struggles with distributional shift and constrained performance due to static dataset limitations, while online RL demands prohibitive environment interactions. The recent advent of hybrid offline-to-online…

Machine Learning · Computer Science 2026-05-19 Qisai Liu , Zhanhong Jiang , Joshua Russell Waite , Aditya Balu , Cody Fleming , Soumik Sarkar

We present a model-based offline reinforcement learning policy performance lower bound that explicitly captures dynamics model misspecification and distribution mismatch and we propose an empirical algorithm for optimal offline policy…

Machine Learning · Computer Science 2023-01-30 Kefan Dong , Yannis Flet-Berliac , Allen Nie , Emma Brunskill

Offline reinforcement learning (RL) addresses the problem of learning a performant policy from a fixed batch of data collected by following some behavior policy. Model-based approaches are particularly appealing in the offline setting since…

Machine Learning · Computer Science 2023-03-06 Jihwan Jeong , Xiaoyu Wang , Michael Gimelfarb , Hyunwoo Kim , Baher Abdulhai , Scott Sanner
‹ Prev 1 2 3 10 Next ›