Related papers: Continuous Doubly Constrained Batch Reinforcement …

Interpretable performance analysis towards offline reinforcement learning: A dataset perspective

Offline reinforcement learning (RL) has increasingly become the focus of the artificial intelligent research due to its wide real-world applications where the collection of data may be difficult, time-consuming, or costly. In this paper, we…

Machine Learning · Computer Science 2021-05-13 Chenyang Xi , Bo Tang , Jiajun Shen , Xinfu Liu , Feiyu Xiong , Xueying Li

Critic Regularized Regression

Offline reinforcement learning (RL), also known as batch RL, offers the prospect of policy optimization from large pre-recorded datasets without online environment interaction. It addresses challenges with regard to the cost of data…

Machine Learning · Computer Science 2021-09-24 Ziyu Wang , Alexander Novikov , Konrad Zolna , Jost Tobias Springenberg , Scott Reed , Bobak Shahriari , Noah Siegel , Josh Merel , Caglar Gulcehre , Nicolas Heess , Nando de Freitas

State-Constrained Offline Reinforcement Learning

Traditional offline reinforcement learning (RL) methods predominantly operate in a batch-constrained setting. This confines the algorithms to a specific state-action distribution present in the dataset, reducing the effects of…

Machine Learning · Statistics 2025-07-16 Charles A. Hepburn , Yue Jin , Giovanni Montana

Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog

Most deep reinforcement learning (RL) systems are not able to learn effectively from off-policy data, especially if they cannot explore online in the environment. These are critical shortcomings for applying RL to real-world problems where…

Machine Learning · Computer Science 2019-07-09 Natasha Jaques , Asma Ghandeharioun , Judy Hanwen Shen , Craig Ferguson , Agata Lapedriza , Noah Jones , Shixiang Gu , Rosalind Picard

A Minimalist Approach to Offline Reinforcement Learning

Offline reinforcement learning (RL) defines the task of learning from a fixed batch of data. Due to errors in value estimation from out-of-distribution actions, most offline RL algorithms take the approach of constraining or regularizing…

Machine Learning · Computer Science 2021-12-06 Scott Fujimoto , Shixiang Shane Gu

D4RL: Datasets for Deep Data-Driven Reinforcement Learning

The offline reinforcement learning (RL) setting (also known as full batch RL), where a policy is learned from a static dataset, is compelling as progress enables RL methods to take advantage of large, previously-collected datasets, much…

Machine Learning · Computer Science 2021-02-09 Justin Fu , Aviral Kumar , Ofir Nachum , George Tucker , Sergey Levine

Provably Good Batch Reinforcement Learning Without Great Exploration

Batch reinforcement learning (RL) is important to apply RL algorithms to many high stakes tasks. Doing batch RL in a way that yields a reliable new policy in large domains is challenging: a new decision policy may visit states and actions…

Machine Learning · Computer Science 2020-07-23 Yao Liu , Adith Swaminathan , Alekh Agarwal , Emma Brunskill

Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets

Offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data. The primary motivation for using reinforcement learning (RL) instead of supervised learning…

Machine Learning · Computer Science 2023-10-13 Zhang-Wei Hong , Aviral Kumar , Sathwik Karnik , Abhishek Bhandwaldar , Akash Srivastava , Joni Pajarinen , Romain Laroche , Abhishek Gupta , Pulkit Agrawal

BRPO: Batch Residual Policy Optimization

In batch reinforcement learning (RL), one often constrains a learned policy to be close to the behavior (data-generating) policy, e.g., by constraining the learned action distribution to differ from the behavior policy by some maximum…

Machine Learning · Computer Science 2020-03-31 Sungryull Sohn , Yinlam Chow , Jayden Ooi , Ofir Nachum , Honglak Lee , Ed Chi , Craig Boutilier

Adaptive Policy Learning for Offline-to-Online Reinforcement Learning

Conventional reinforcement learning (RL) needs an environment to collect fresh data, which is impractical when online interactions are costly. Offline RL provides an alternative solution by directly learning from the previously collected…

Machine Learning · Computer Science 2023-03-15 Han Zheng , Xufang Luo , Pengfei Wei , Xuan Song , Dongsheng Li , Jing Jiang

Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL

Several practical applications of reinforcement learning involve an agent learning from past data without the possibility of further exploration. Often these applications require us to 1) identify a near optimal policy or to 2) estimate the…

Machine Learning · Computer Science 2021-06-22 Andrea Zanette

Continuous Action Reinforcement Learning from a Mixture of Interpretable Experts

Reinforcement learning (RL) has demonstrated its ability to solve high dimensional tasks by leveraging non-linear function approximators. However, these successes are mostly achieved by 'black-box' policies in simulated domains. When…

Machine Learning · Computer Science 2021-11-19 Riad Akrour , Davide Tateo , Jan Peters

Improving and Benchmarking Offline Reinforcement Learning Algorithms

Recently, Offline Reinforcement Learning (RL) has achieved remarkable progress with the emergence of various algorithms and datasets. However, these methods usually focus on algorithmic advancements, ignoring that many low-level…

Machine Learning · Computer Science 2023-06-02 Bingyi Kang , Xiao Ma , Yirui Wang , Yang Yue , Shuicheng Yan

Robust Offline Reinforcement Learning with Gradient Penalty and Constraint Relaxation

A promising paradigm for offline reinforcement learning (RL) is to constrain the learned policy to stay close to the dataset behaviors, known as policy constraint offline RL. However, existing works heavily rely on the purity of the data,…

Machine Learning · Computer Science 2022-10-20 Chengqian Gao , Ke Xu , Liu Liu , Deheng Ye , Peilin Zhao , Zhiqiang Xu

STEEL: Singularity-aware Reinforcement Learning

Batch reinforcement learning (RL) aims at leveraging pre-collected data to find an optimal policy that maximizes the expected total rewards in a dynamic environment. The existing methods require absolutely continuous assumption (e.g., there…

Machine Learning · Statistics 2024-06-27 Xiaohong Chen , Zhengling Qi , Runzhe Wan

Constraints Penalized Q-learning for Safe Offline Reinforcement Learning

We study the problem of safe offline reinforcement learning (RL), the goal is to learn a policy that maximizes long-term reward while satisfying safety constraints given only offline data, without further interaction with the environment.…

Machine Learning · Computer Science 2022-04-11 Haoran Xu , Xianyuan Zhan , Xiangyu Zhu

Causality and Batch Reinforcement Learning: Complementary Approaches To Planning In Unknown Domains

Reinforcement learning algorithms have had tremendous successes in online learning settings. However, these successes have relied on low-stakes interactions between the algorithmic agent and its environment. In many settings where RL could…

Machine Learning · Computer Science 2020-06-05 James Bannon , Brad Windsor , Wenbo Song , Tao Li

Locally Constrained Representations in Reinforcement Learning

The success of Reinforcement Learning (RL) heavily relies on the ability to learn robust representations from the observations of the environment. In most cases, the representations learned purely by the reinforcement learning loss can…

Machine Learning · Computer Science 2024-02-12 Somjit Nath , Rushiv Arora , Samira Ebrahimi Kahou

Offline Reinforcement Learning with Adaptive Behavior Regularization

Offline reinforcement learning (RL) defines a sample-efficient learning paradigm, where a policy is learned from static and previously collected datasets without additional interaction with the environment. The major obstacle to offline RL…

Machine Learning · Computer Science 2022-11-16 Yunfan Zhou , Xijun Li , Qingyu Qu

Off-Policy Deep Reinforcement Learning without Exploration

Many practical applications of reinforcement learning constrain agents to learn from a fixed batch of data which has already been gathered, without offering further possibility for data collection. In this paper, we demonstrate that due to…

Machine Learning · Computer Science 2019-08-13 Scott Fujimoto , David Meger , Doina Precup