Related papers: When does return-conditioned supervised learning w…

How to Provably Improve Return Conditioned Supervised Learning?

In sequential decision-making problems, Return-Conditioned Supervised Learning (RCSL) has gained increasing recognition for its simplicity and stability in modern decision-making tasks. Unlike traditional offline reinforcement learning (RL)…

Machine Learning · Computer Science 2025-06-11 Zhishuai Liu , Yu Yang , Ruhan Wang , Pan Xu , Dongruo Zhou

Offline Goal-Conditioned Reinforcement Learning for Safety-Critical Tasks with Recovery Policy

Offline goal-conditioned reinforcement learning (GCRL) aims at solving goal-reaching tasks with sparse rewards from an offline dataset. While prior work has demonstrated various approaches for agents to learn near-optimal policies, these…

Robotics · Computer Science 2024-03-05 Chenyang Cao , Zichen Yan , Renhao Lu , Junbo Tan , Xueqian Wang

A Tutorial: An Intuitive Explanation of Offline Reinforcement Learning Theory

Offline reinforcement learning (RL) aims to optimize the return given a fixed dataset of agent trajectories without additional interactions with the environment. While algorithm development has progressed rapidly, significant theoretical…

Machine Learning · Computer Science 2025-08-12 Fengdi Che

Learning Optimal and Sample-Efficient Decision Policies with Guarantees

The paradigm of decision-making has been revolutionised by reinforcement learning and deep learning. Although this has led to significant progress in domains such as robotics, healthcare, and finance, the use of RL in practice is…

Machine Learning · Computer Science 2026-02-23 Daqian Shao

State-Constrained Offline Reinforcement Learning

Traditional offline reinforcement learning (RL) methods predominantly operate in a batch-constrained setting. This confines the algorithms to a specific state-action distribution present in the dataset, reducing the effects of…

Machine Learning · Statistics 2025-07-16 Charles A. Hepburn , Yue Jin , Giovanni Montana

RvS: What is Essential for Offline RL via Supervised Learning?

Recent work has shown that supervised learning alone, without temporal difference (TD) learning, can be remarkably effective for offline RL. When does this hold true, and which algorithmic components are necessary? Through extensive…

Machine Learning · Computer Science 2022-05-12 Scott Emmons , Benjamin Eysenbach , Ilya Kostrikov , Sergey Levine

Sample-Efficient Policy Constraint Offline Deep Reinforcement Learning based on Sample Filtering

Offline reinforcement learning (RL) aims to learn a policy that maximizes the expected return using a given static dataset of transitions. However, offline RL faces the distribution shift problem. The policy constraint offline RL method is…

Machine Learning · Computer Science 2025-12-24 Yuanhao Chen , Qi Liu , Pengbin Chen , Zhongjian Qiao , Yanjie Li

Robust Offline Reinforcement Learning for Non-Markovian Decision Processes

Distributionally robust offline reinforcement learning (RL) aims to find a policy that performs the best under the worst environment within an uncertainty set using an offline dataset collected from a nominal model. While recent advances in…

Machine Learning · Computer Science 2025-01-07 Ruiquan Huang , Yingbin Liang , Jing Yang

Multi-Objective Decision Transformers for Offline Reinforcement Learning

Offline Reinforcement Learning (RL) is structured to derive policies from static trajectory data without requiring real-time environment interactions. Recent studies have shown the feasibility of framing offline RL as a sequence modeling…

Machine Learning · Computer Science 2023-09-01 Abdelghani Ghanem , Philippe Ciblat , Mounir Ghogho

Distance Weighted Supervised Learning for Offline Interaction Data

Sequential decision making algorithms often struggle to leverage different sources of unstructured offline interaction data. Imitation learning (IL) methods based on supervised learning are robust, but require optimal demonstrations, which…

Machine Learning · Computer Science 2023-04-28 Joey Hejna , Jensen Gao , Dorsa Sadigh

Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets

Offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data. The primary motivation for using reinforcement learning (RL) instead of supervised learning…

Machine Learning · Computer Science 2023-10-13 Zhang-Wei Hong , Aviral Kumar , Sathwik Karnik , Abhishek Bhandwaldar , Akash Srivastava , Joni Pajarinen , Romain Laroche , Abhishek Gupta , Pulkit Agrawal

Representation Matters: Offline Pretraining for Sequential Decision Making

The recent success of supervised learning methods on ever larger offline datasets has spurred interest in the reinforcement learning (RL) field to investigate whether the same paradigms can be translated to RL algorithms. This research…

Machine Learning · Computer Science 2021-02-12 Mengjiao Yang , Ofir Nachum

Robust Offline Reinforcement Learning with Gradient Penalty and Constraint Relaxation

A promising paradigm for offline reinforcement learning (RL) is to constrain the learned policy to stay close to the dataset behaviors, known as policy constraint offline RL. However, existing works heavily rely on the purity of the data,…

Machine Learning · Computer Science 2022-10-20 Chengqian Gao , Ke Xu , Liu Liu , Deheng Ye , Peilin Zhao , Zhiqiang Xu

Offline Reinforcement Learning with Additional Covering Distributions

We study learning optimal policies from a logged dataset, i.e., offline RL, with function approximation. Despite the efforts devoted, existing algorithms with theoretic finite-sample guarantees typically assume exploratory data coverage or…

Machine Learning · Computer Science 2023-05-25 Chenjie Mao

What are the Statistical Limits of Offline RL with Linear Function Approximation?

Offline reinforcement learning seeks to utilize offline (observational) data to guide the learning of (causal) sequential decision making strategies. The hope is that offline reinforcement learning coupled with function approximation…

Machine Learning · Computer Science 2020-10-23 Ruosong Wang , Dean P. Foster , Sham M. Kakade

A Survey on Offline Reinforcement Learning: Taxonomy, Review, and Open Problems

With the widespread adoption of deep learning, reinforcement learning (RL) has experienced a dramatic increase in popularity, scaling to previously intractable problems, such as playing complex games from pixel observations, sustaining…

Machine Learning · Computer Science 2023-04-20 Rafael Figueiredo Prudencio , Marcos R. O. A. Maximo , Esther Luna Colombini

Adaptive Policy Learning for Offline-to-Online Reinforcement Learning

Conventional reinforcement learning (RL) needs an environment to collect fresh data, which is impractical when online interactions are costly. Offline RL provides an alternative solution by directly learning from the previously collected…

Machine Learning · Computer Science 2023-03-15 Han Zheng , Xufang Luo , Pengfei Wei , Xuan Song , Dongsheng Li , Jing Jiang

COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation

We consider the offline constrained reinforcement learning (RL) problem, in which the agent aims to compute a policy that maximizes expected return while satisfying given cost constraints, learning only from a pre-collected dataset. This…

Machine Learning · Computer Science 2022-04-20 Jongmin Lee , Cosmin Paduraru , Daniel J. Mankowitz , Nicolas Heess , Doina Precup , Kee-Eung Kim , Arthur Guez

Constrained Decision Transformer for Offline Safe Reinforcement Learning

Safe reinforcement learning (RL) trains a constraint satisfaction policy by interacting with the environment. We aim to tackle a more challenging problem: learning a safe policy from an offline dataset. We study the offline safe RL problem…

Machine Learning · Computer Science 2023-06-22 Zuxin Liu , Zijian Guo , Yihang Yao , Zhepeng Cen , Wenhao Yu , Tingnan Zhang , Ding Zhao

Latent Safety-Constrained Policy Approach for Safe Offline Reinforcement Learning

In safe offline reinforcement learning (RL), the objective is to develop a policy that maximizes cumulative rewards while strictly adhering to safety constraints, utilizing only offline data. Traditional methods often face difficulties in…

Machine Learning · Computer Science 2026-02-11 Prajwal Koirala , Zhanhong Jiang , Soumik Sarkar , Cody Fleming