Related papers: On Instance-Dependent Bounds for Offline Reinforce…

Towards Instance-Optimal Offline Reinforcement Learning with Pessimism

We study the offline reinforcement learning (offline RL) problem, where the goal is to learn a reward-maximizing policy in an unknown Markov Decision Process (MDP) using the data coming from a policy $\mu$. In particular, we consider the…

Machine Learning · Computer Science 2021-10-19 Ming Yin , Yu-Xiang Wang

Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning

Offline reinforcement learning (RL), where the agent aims to learn the optimal policy based on the data collected by a behavior policy, has attracted increasing attention in recent years. While offline RL with linear function approximation…

Machine Learning · Computer Science 2024-10-10 Qiwei Di , Heyang Zhao , Jiafan He , Quanquan Gu

Minimax Optimal and Computationally Efficient Algorithms for Distributionally Robust Offline Reinforcement Learning

Distributionally robust offline reinforcement learning (RL), which seeks robust policy training against environment perturbation by modeling dynamics uncertainty, calls for function approximations when facing large state-action spaces.…

Machine Learning · Computer Science 2025-11-03 Zhishuai Liu , Pan Xu

The Least Restriction for Offline Reinforcement Learning

Many practical applications of reinforcement learning (RL) constrain the agent to learn from a fixed offline dataset of logged interactions, which has already been gathered, without offering further possibility for data collection. However,…

Machine Learning · Computer Science 2021-07-06 Zizhou Su

Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism

Offline (or batch) reinforcement learning (RL) algorithms seek to learn an optimal policy from a fixed dataset without active data collection. Based on the composition of the offline dataset, two main categories of methods are used:…

Machine Learning · Computer Science 2023-07-04 Paria Rashidinejad , Banghua Zhu , Cong Ma , Jiantao Jiao , Stuart Russell

Adaptive Resolving Methods for Reinforcement Learning with Function Approximations

Reinforcement learning (RL) problems are fundamental in online decision-making and have been instrumental in finding an optimal policy for Markov decision processes (MDPs). Function approximations are usually deployed to handle large or…

Machine Learning · Computer Science 2025-05-20 Jiashuo Jiang , Yiming Zong , Yinyu Ye

On the Statistical Complexity for Offline and Low-Adaptive Reinforcement Learning with Structures

This article reviews the recent advances on the statistical foundation of reinforcement learning (RL) in the offline and low-adaptive settings. We will start by arguing why offline RL is the appropriate model for almost any real-life ML…

Machine Learning · Computer Science 2025-01-07 Ming Yin , Mengdi Wang , Yu-Xiang Wang

What are the Statistical Limits of Offline RL with Linear Function Approximation?

Offline reinforcement learning seeks to utilize offline (observational) data to guide the learning of (causal) sequential decision making strategies. The hope is that offline reinforcement learning coupled with function approximation…

Machine Learning · Computer Science 2020-10-23 Ruosong Wang , Dean P. Foster , Sham M. Kakade

Near-optimal Offline Reinforcement Learning with Linear Representation: Leveraging Variance Information with Pessimism

Offline reinforcement learning, which seeks to utilize offline/historical data to optimize sequential decision-making strategies, has gained surging prominence in recent studies. Due to the advantage that appropriate function approximators…

Machine Learning · Computer Science 2022-03-14 Ming Yin , Yaqi Duan , Mengdi Wang , Yu-Xiang Wang

COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation

We consider the offline constrained reinforcement learning (RL) problem, in which the agent aims to compute a policy that maximizes expected return while satisfying given cost constraints, learning only from a pre-collected dataset. This…

Machine Learning · Computer Science 2022-04-20 Jongmin Lee , Cosmin Paduraru , Daniel J. Mankowitz , Nicolas Heess , Doina Precup , Kee-Eung Kim , Arthur Guez

Settling the Sample Complexity of Model-Based Offline Reinforcement Learning

This paper is concerned with offline reinforcement learning (RL), which learns using pre-collected data without further exploration. Effective offline RL would be able to accommodate distribution shift and limited data coverage. However,…

Machine Learning · Statistics 2024-03-11 Gen Li , Laixi Shi , Yuxin Chen , Yuejie Chi , Yuting Wei

Offline Reinforcement Learning with Additional Covering Distributions

We study learning optimal policies from a logged dataset, i.e., offline RL, with function approximation. Despite the efforts devoted, existing algorithms with theoretic finite-sample guarantees typically assume exploratory data coverage or…

Machine Learning · Computer Science 2023-05-25 Chenjie Mao

Provably Efficient Offline-to-Online Value Adaptation with General Function Approximation

We study value adaptation in offline-to-online reinforcement learning under general function approximation. Starting from an imperfect offline pretrained $Q$-function, the learner aims to adapt it to the target environment using only a…

Machine Learning · Computer Science 2026-04-16 Shangzhe Li , Weitong Zhang

OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation

We consider the offline reinforcement learning (RL) setting where the agent aims to optimize the policy solely from the data without further environment interactions. In offline RL, the distributional shift becomes the primary source of…

Machine Learning · Computer Science 2021-06-22 Jongmin Lee , Wonseok Jeon , Byung-Jun Lee , Joelle Pineau , Kee-Eung Kim

Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning

Offline Reinforcement Learning (RL) aims to learn policies from previously collected datasets without exploring the environment. Directly applying off-policy algorithms to offline RL usually fails due to the extrapolation error caused by…

Machine Learning · Computer Science 2022-02-24 Chenjia Bai , Lingxiao Wang , Zhuoran Yang , Zhihong Deng , Animesh Garg , Peng Liu , Zhaoran Wang

Nearly Minimax Optimal Offline Reinforcement Learning with Linear Function Approximation: Single-Agent MDP and Markov Game

Offline reinforcement learning (RL) aims at learning an optimal strategy using a pre-collected dataset without further interactions with the environment. While various algorithms have been proposed for offline RL in the previous literature,…

Machine Learning · Computer Science 2023-03-02 Wei Xiong , Han Zhong , Chengshuai Shi , Cong Shen , Liwei Wang , Tong Zhang

On Gap-dependent Bounds for Offline Reinforcement Learning

This paper presents a systematic study on gap-dependent sample complexity in offline reinforcement learning. Prior work showed when the density ratio between an optimal policy and the behavior policy is upper bounded (the optimal policy…

Machine Learning · Computer Science 2022-08-05 Xinqi Wang , Qiwen Cui , Simon S. Du

Online Optimization for Offline Safe Reinforcement Learning

We study the problem of Offline Safe Reinforcement Learning (OSRL), where the goal is to learn a reward-maximizing policy from fixed data under a cumulative cost constraint. We propose a novel OSRL approach that frames the problem as a…

Machine Learning · Computer Science 2025-10-28 Yassine Chemingui , Aryan Deshwal , Alan Fern , Thanh Nguyen-Tang , Janardhan Rao Doppa

A Minimalist Approach to Offline Reinforcement Learning

Offline reinforcement learning (RL) defines the task of learning from a fixed batch of data. Due to errors in value estimation from out-of-distribution actions, most offline RL algorithms take the approach of constraining or regularizing…

Machine Learning · Computer Science 2021-12-06 Scott Fujimoto , Shixiang Shane Gu

Offline Reinforcement Learning via Inverse Optimization

Inspired by the recent successes of Inverse Optimization (IO) across various application domains, we propose a novel offline Reinforcement Learning (ORL) algorithm for continuous state and action spaces, leveraging the convex loss function…

Machine Learning · Computer Science 2026-03-19 Ioannis Dimanidis , Tolga Ok , Peyman Mohajerin Esfahani