Related papers: Stable Offline Value Function Learning with Bisimu…

Understanding and Addressing the Pitfalls of Bisimulation-based Representations in Offline Reinforcement Learning

While bisimulation-based approaches hold promise for learning robust state representations for Reinforcement Learning (RL) tasks, their efficacy in offline RL tasks has not been up to par. In some instances, their performance has even…

Machine Learning · Computer Science 2023-10-27 Hongyu Zang , Xin Li , Leiji Zhang , Yang Liu , Baigui Sun , Riashat Islam , Remi Tachet des Combes , Romain Laroche

Behavior Prior Representation learning for Offline Reinforcement Learning

Offline reinforcement learning (RL) struggles in environments with rich and noisy inputs, where the agent only has access to a fixed dataset without environment interactions. Past works have proposed common workarounds based on the…

Machine Learning · Computer Science 2023-03-01 Hongyu Zang , Xin Li , Jie Yu , Chen Liu , Riashat Islam , Remi Tachet Des Combes , Romain Laroche

VIPO: Value Function Inconsistency Penalized Offline Reinforcement Learning

Offline reinforcement learning (RL) learns effective policies from pre-collected datasets, offering a practical solution for applications where online interactions are risky or costly. Model-based approaches are particularly advantageous…

Machine Learning · Computer Science 2026-05-14 Xuyang Chen , Keyu Yan , Guojian Wang , Lin Zhao

Towards Robust Bisimulation Metric Learning

Learned representations in deep reinforcement learning (DRL) have to extract task-relevant information from complex observations, balancing between robustness to distraction and informativeness to the policy. Such stable and rich…

Machine Learning · Computer Science 2021-10-28 Mete Kemertas , Tristan Aumentado-Armstrong

Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning

Offline reinforcement learning (RL) extends the paradigm of classical RL algorithms to purely learning from static datasets, without interacting with the underlying environment during the learning process. A key challenge of offline RL is…

Machine Learning · Computer Science 2022-06-16 Shentao Yang , Yihao Feng , Shujian Zhang , Mingyuan Zhou

Offline Reinforcement Learning with Value-based Episodic Memory

Offline reinforcement learning (RL) shows promise of applying RL to real-world problems by effectively utilizing previously collected data. Most existing offline RL algorithms use regularization or constraints to suppress extrapolation…

Machine Learning · Computer Science 2021-10-20 Xiaoteng Ma , Yiqin Yang , Hao Hu , Qihan Liu , Jun Yang , Chongjie Zhang , Qianchuan Zhao , Bin Liang

Towards Data-Driven Offline Simulations for Online Reinforcement Learning

Modern decision-making systems, from robots to web recommendation engines, are expected to adapt: to user preferences, changing circumstances or even new tasks. Yet, it is still uncommon to deploy a dynamically learning agent (rather than a…

Machine Learning · Computer Science 2022-11-15 Shengpu Tang , Felipe Vieira Frujeri , Dipendra Misra , Alex Lamb , John Langford , Paul Mineiro , Sebastian Kochman

A Unified Framework for Alternating Offline Model Training and Policy Learning

In offline model-based reinforcement learning (offline MBRL), we learn a dynamic model from historically collected data, and subsequently utilize the learned model and fixed datasets for policy learning, without further interacting with the…

Machine Learning · Computer Science 2022-10-13 Shentao Yang , Shujian Zhang , Yihao Feng , Mingyuan Zhou

Instabilities of Offline RL with Pre-Trained Neural Representation

In offline reinforcement learning (RL), we seek to utilize offline data to evaluate (or learn) policies in scenarios where the data are collected from a distribution that substantially differs from that of the target policy to be evaluated.…

Machine Learning · Computer Science 2021-03-09 Ruosong Wang , Yifan Wu , Ruslan Salakhutdinov , Sham M. Kakade

Is Value Learning Really the Main Bottleneck in Offline RL?

While imitation learning requires access to high-quality data, offline reinforcement learning (RL) should, in principle, perform similarly or better with substantially lower data quality by using a value function. However, current results…

Machine Learning · Computer Science 2024-10-30 Seohong Park , Kevin Frans , Sergey Levine , Aviral Kumar

COMBO: Conservative Offline Model-Based Policy Optimization

Model-based algorithms, which learn a dynamics model from logged experience and perform some sort of pessimistic planning under the learned model, have emerged as a promising paradigm for offline reinforcement learning (offline RL).…

Machine Learning · Computer Science 2022-01-28 Tianhe Yu , Aviral Kumar , Rafael Rafailov , Aravind Rajeswaran , Sergey Levine , Chelsea Finn

Bi-Level Offline Policy Optimization with Limited Exploration

We study offline reinforcement learning (RL) which seeks to learn a good policy based on a fixed, pre-collected dataset. A fundamental challenge behind this task is the distributional shift due to the dataset lacking sufficient exploration,…

Machine Learning · Computer Science 2023-10-11 Wenzhuo Zhou

High-confidence error estimates for learned value functions

Estimating the value function for a fixed policy is a fundamental problem in reinforcement learning. Policy evaluation algorithms---to estimate value functions---continue to be developed, to improve convergence rates, improve stability and…

Machine Learning · Statistics 2018-08-29 Touqir Sajed , Wesley Chung , Martha White

Conservative Bayesian Model-Based Value Expansion for Offline Policy Optimization

Offline reinforcement learning (RL) addresses the problem of learning a performant policy from a fixed batch of data collected by following some behavior policy. Model-based approaches are particularly appealing in the offline setting since…

Machine Learning · Computer Science 2023-03-06 Jihwan Jeong , Xiaoyu Wang , Michael Gimelfarb , Hyunwoo Kim , Baher Abdulhai , Scott Sanner

CROP: Conservative Reward for Model-based Offline Policy Optimization

Offline reinforcement learning (RL) aims to optimize a policy using collected data without online interactions. Model-based approaches are particularly appealing for addressing offline RL challenges because of their capability to mitigate…

Machine Learning · Computer Science 2026-04-14 Hao Li , Xiao-Hu Zhou , Shu-Hai Li , Mei-Jiang Gui , Xiao-Liang Xie , Shi-Qi Liu , Shuang-Yi Wang , Zhen-Qiu Feng , Zeng-Guang Hou

Representation Matters: Offline Pretraining for Sequential Decision Making

The recent success of supervised learning methods on ever larger offline datasets has spurred interest in the reinforcement learning (RL) field to investigate whether the same paradigms can be translated to RL algorithms. This research…

Machine Learning · Computer Science 2021-02-12 Mengjiao Yang , Ofir Nachum

Conservative State Value Estimation for Offline Reinforcement Learning

Offline reinforcement learning faces a significant challenge of value over-estimation due to the distributional drift between the dataset and the current learned policy, leading to learning failure in practice. The common approach is to…

Machine Learning · Computer Science 2023-12-05 Liting Chen , Jie Yan , Zhengdao Shao , Lu Wang , Qingwei Lin , Saravan Rajmohan , Thomas Moscibroda , Dongmei Zhang

Offline Reinforcement Learning: Fundamental Barriers for Value Function Approximation

We consider the offline reinforcement learning problem, where the aim is to learn a decision making policy from logged data. Offline RL -- particularly when coupled with (value) function approximation to allow for generalization in large or…

Machine Learning · Computer Science 2022-08-31 Dylan J. Foster , Akshay Krishnamurthy , David Simchi-Levi , Yunzong Xu

Offline Reinforcement Learning from Datasets with Structured Non-Stationarity

Current Reinforcement Learning (RL) is often limited by the large amount of data needed to learn a successful policy. Offline RL aims to solve this issue by using transitions collected by a different behavior policy. We address a novel…

Machine Learning · Computer Science 2024-05-29 Johannes Ackermann , Takayuki Osa , Masashi Sugiyama

Evaluation-Time Policy Switching for Offline Reinforcement Learning

Offline reinforcement learning (RL) looks at learning how to optimally solve tasks using a fixed dataset of interactions from the environment. Many off-policy algorithms developed for online learning struggle in the offline setting as they…

Machine Learning · Computer Science 2025-03-18 Natinael Solomon Neggatu , Jeremie Houssineau , Giovanni Montana