English
Related papers

Related papers: Regularized Behavior Value Estimation

200 papers

Online interactions with the environment to collect data samples for training a Reinforcement Learning (RL) agent is not always feasible due to economic and safety concerns. The goal of Offline Reinforcement Learning is to address this…

Machine Learning · Computer Science 2021-10-05 Chi Zhang , Sanmukh Rao Kuppannagari , Viktor K Prasanna

Offline reinforcement learning (RL) defines a sample-efficient learning paradigm, where a policy is learned from static and previously collected datasets without additional interaction with the environment. The major obstacle to offline RL…

Machine Learning · Computer Science 2022-11-16 Yunfan Zhou , Xijun Li , Qingyu Qu

In reinforcement learning (RL) research, it is common to assume access to direct online interactions with the environment. However in many real-world applications, access to the environment is limited to a fixed offline dataset of logged…

Machine Learning · Computer Science 2019-11-27 Yifan Wu , George Tucker , Ofir Nachum

Many reinforcement learning algorithms, particularly those that rely on return estimates for policy improvement, can suffer from poor sample efficiency and training instability due to high-variance return estimates. In this paper we…

Machine Learning · Computer Science 2026-01-06 Alexander W. Goodall , Edwin Hamel-De le Court , Francesco Belardinelli

Offline reinforcement learning aims to utilize datasets of previously gathered environment-action interaction records to learn a policy without access to the real environment. Recent work has shown that offline reinforcement learning can be…

Machine Learning · Computer Science 2023-08-30 Hanhan Zhou , Tian Lan , Vaneet Aggarwal

The offline reinforcement learning (RL) paradigm provides a general recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data. While policy constraints, conservatism, and other…

Artificial Intelligence · Computer Science 2023-10-19 Jianlan Luo , Perry Dong , Jeffrey Wu , Aviral Kumar , Xinyang Geng , Sergey Levine

Offline reinforcement learning (RL) have received rising interest due to its appealing data efficiency. The present study addresses behavior estimation, a task that lays the foundation of many offline RL algorithms. Behavior estimation aims…

Machine Learning · Computer Science 2023-05-29 Guoxi Zhang , Hisashi Kashima

Off-policy reinforcement learning algorithms promise to be applicable in settings where only a fixed data-set (batch) of environment interactions is available and no new experience can be acquired. This property makes these algorithms…

Offline reinforcement learning (RL) algorithms are applied to learn performant, well-generalizing policies when provided with a static dataset of interactions. Many recent approaches to offline RL have seen substantial success, but with one…

Machine Learning · Computer Science 2024-07-30 Padmanaba Srinivasan , William Knottenbelt

We propose a policy improvement algorithm for Reinforcement Learning (RL) which is called Rerouted Behavior Improvement (RBI). RBI is designed to take into account the evaluation errors of the Q-function. Such errors are common in RL when…

Machine Learning · Computer Science 2019-07-12 Elad Sarafian , Aviv Tamar , Sarit Kraus

In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative reward of a target policy using logged trajectory data generated from a different behavior policy, without execution of the target policy.…

Machine Learning · Computer Science 2022-11-04 Jie Wang , Rui Gao , Hongyuan Zha

Offline reinforcement learning (RL) methods aim to learn optimal policies with access only to trajectories in a fixed dataset. Policy constraint methods formulate policy learning as an optimization problem that balances maximizing reward…

Machine Learning · Computer Science 2025-03-04 Padmanaba Srinivasan , William Knottenbelt

Offline reinforcement learning (RL) shows promise of applying RL to real-world problems by effectively utilizing previously collected data. Most existing offline RL algorithms use regularization or constraints to suppress extrapolation…

Machine Learning · Computer Science 2021-10-20 Xiaoteng Ma , Yiqin Yang , Hao Hu , Qihan Liu , Jun Yang , Chongjie Zhang , Qianchuan Zhao , Bin Liang

Offline reinforcement learning (RL) struggles in environments with rich and noisy inputs, where the agent only has access to a fixed dataset without environment interactions. Past works have proposed common workarounds based on the…

Machine Learning · Computer Science 2023-03-01 Hongyu Zang , Xin Li , Jie Yu , Chen Liu , Riashat Islam , Remi Tachet Des Combes , Romain Laroche

We study the problem of off-policy evaluation (OPE) in Reinforcement Learning (RL), where the aim is to estimate the performance of a new policy given historical data that may have been generated by a different policy, or policies. In…

Machine Learning · Computer Science 2019-12-16 Aurélien F. Bibaut , Ivana Malenica , Nikos Vlassis , Mark J. van der Laan

Offline reinforcement learning (RL) looks at learning how to optimally solve tasks using a fixed dataset of interactions from the environment. Many off-policy algorithms developed for online learning struggle in the offline setting as they…

Machine Learning · Computer Science 2025-03-18 Natinael Solomon Neggatu , Jeremie Houssineau , Giovanni Montana

Reinforcement learning with verifiable rewards (RLVR) for Large Reasoning Models hinges on baseline estimation for variance reduction, but existing approaches pay a heavy price: PPO requires a policy-model scale critic, while GRPO needs…

Machine Learning · Computer Science 2026-05-12 Yunho Choi , Jongwon Lim , Woojin Ahn , Minjae Oh , Jeonghoon Shim , Yohan Jo

One of the fundamental challenges for offline reinforcement learning (RL) is ensuring robustness to data distribution. Whether the data originates from a near-optimal policy or not, we anticipate that an algorithm should demonstrate its…

Machine Learning · Computer Science 2023-10-18 Xiaohan Hu , Yi Ma , Chenjun Xiao , Yan Zheng , Jianye Hao

Value function estimation is an indispensable subroutine in reinforcement learning, which becomes more challenging in the offline setting. In this paper, we propose Hybrid Value Estimation (HVE) to reduce value estimation error, which…

Machine Learning · Computer Science 2022-06-07 Xue-Kun Jin , Xu-Hui Liu , Shengyi Jiang , Yang Yu

Offline Reinforcement Learning (RL) aims to extract near-optimal policies from imperfect offline data without additional environment interactions. Extracting policies from diverse offline datasets has the potential to expand the range of…

Machine Learning · Computer Science 2021-06-21 Catherine Cang , Aravind Rajeswaran , Pieter Abbeel , Michael Laskin
‹ Prev 1 2 3 10 Next ›