Related papers: VIPeR: Provably Efficient Algorithm for Offline RL…

VIPO: Value Function Inconsistency Penalized Offline Reinforcement Learning

Offline reinforcement learning (RL) learns effective policies from pre-collected datasets, offering a practical solution for applications where online interactions are risky or costly. Model-based approaches are particularly advantageous…

Machine Learning · Computer Science 2026-05-14 Xuyang Chen , Keyu Yan , Guojian Wang , Lin Zhao

Is Pessimism Provably Efficient for Offline RL?

We study offline reinforcement learning (RL), which aims to learn an optimal policy based on a dataset collected a priori. Due to the lack of further interactions with the environment, offline RL suffers from the insufficient coverage of…

Machine Learning · Computer Science 2022-05-06 Ying Jin , Zhuoran Yang , Zhaoran Wang

Near-optimal Offline Reinforcement Learning with Linear Representation: Leveraging Variance Information with Pessimism

Offline reinforcement learning, which seeks to utilize offline/historical data to optimize sequential decision-making strategies, has gained surging prominence in recent studies. Due to the advantage that appropriate function approximators…

Machine Learning · Computer Science 2022-03-14 Ming Yin , Yaqi Duan , Mengdi Wang , Yu-Xiang Wang

Provably Efficient Offline Reinforcement Learning with Trajectory-Wise Reward

The remarkable success of reinforcement learning (RL) heavily relies on observing the reward of every visited state-action pair. In many real world applications, however, an agent can observe only a score that represents the quality of the…

Machine Learning · Computer Science 2023-04-20 Tengyu Xu , Yue Wang , Shaofeng Zou , Yingbin Liang

Offline Reinforcement Learning with Value-based Episodic Memory

Offline reinforcement learning (RL) shows promise of applying RL to real-world problems by effectively utilizing previously collected data. Most existing offline RL algorithms use regularization or constraints to suppress extrapolation…

Machine Learning · Computer Science 2021-10-20 Xiaoteng Ma , Yiqin Yang , Hao Hu , Qihan Liu , Jun Yang , Chongjie Zhang , Qianchuan Zhao , Bin Liang

The Virtues of Pessimism in Inverse Reinforcement Learning

Inverse Reinforcement Learning (IRL) is a powerful framework for learning complex behaviors from expert demonstrations. However, it traditionally requires repeatedly solving a computationally expensive reinforcement learning (RL) problem in…

Machine Learning · Computer Science 2024-02-09 David Wu , Gokul Swamy , J. Andrew Bagnell , Zhiwei Steven Wu , Sanjiban Choudhury

Towards Instance-Optimal Offline Reinforcement Learning with Pessimism

We study the offline reinforcement learning (offline RL) problem, where the goal is to learn a reward-maximizing policy in an unknown Markov Decision Process (MDP) using the data coming from a policy $\mu$. In particular, we consider the…

Machine Learning · Computer Science 2021-10-19 Ming Yin , Yu-Xiang Wang

Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning

Offline reinforcement learning (RL), where the agent aims to learn the optimal policy based on the data collected by a behavior policy, has attracted increasing attention in recent years. While offline RL with linear function approximation…

Machine Learning · Computer Science 2024-10-10 Qiwei Di , Heyang Zhao , Jiafan He , Quanquan Gu

Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning

We study risk-sensitive reinforcement learning (RL), a crucial field due to its ability to enhance decision-making in scenarios where it is essential to manage uncertainty and minimize potential adverse outcomes. Particularly, our work…

Machine Learning · Computer Science 2024-07-11 Dake Zhang , Boxiang Lyu , Shuang Qiu , Mladen Kolar , Tong Zhang

Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage

In this paper, we study distributionally robust offline reinforcement learning (robust offline RL), which seeks to find an optimal policy purely from an offline dataset that can perform well in perturbed environments. In specific, we…

Machine Learning · Computer Science 2023-08-23 Jose Blanchet , Miao Lu , Tong Zhang , Han Zhong

Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism

Offline (or batch) reinforcement learning (RL) algorithms seek to learn an optimal policy from a fixed dataset without active data collection. Based on the composition of the offline dataset, two main categories of methods are used:…

Machine Learning · Computer Science 2023-07-04 Paria Rashidinejad , Banghua Zhu , Cong Ma , Jiantao Jiao , Stuart Russell

On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation

Sample-efficient offline reinforcement learning (RL) with linear function approximation has recently been studied extensively. Much of prior work has yielded the minimax-optimal bound of $\tilde{\mathcal{O}}(\frac{1}{\sqrt{K}})$, with $K$…

Machine Learning · Computer Science 2023-01-30 Thanh Nguyen-Tang , Ming Yin , Sunil Gupta , Svetha Venkatesh , Raman Arora

Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes

We study offline reinforcement learning (RL) in partially observable Markov decision processes. In particular, we aim to learn an optimal policy from a dataset collected by a behavior policy which possibly depends on the latent state. Such…

Machine Learning · Computer Science 2024-04-02 Miao Lu , Yifei Min , Zhaoran Wang , Zhuoran Yang

Offline Inverse RL: New Solution Concepts and Provably Efficient Algorithms

Inverse reinforcement learning (IRL) aims to recover the reward function of an expert agent from demonstrations of behavior. It is well-known that the IRL problem is fundamentally ill-posed, i.e., many reward functions can explain the…

Machine Learning · Computer Science 2024-06-07 Filippo Lazzati , Mirco Mutti , Alberto Maria Metelli

Survival Instinct in Offline Reinforcement Learning

We present a novel observation about the behavior of offline reinforcement learning (RL) algorithms: on many benchmark datasets, offline RL can produce well-performing and safe policies even when trained with "wrong" reward labels, such as…

Machine Learning · Computer Science 2023-11-09 Anqi Li , Dipendra Misra , Andrey Kolobov , Ching-An Cheng

Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF

Reinforcement learning from human feedback (RLHF) has demonstrated great promise in aligning large language models (LLMs) with human preference. Depending on the availability of preference data, both online and offline RLHF are active areas…

Machine Learning · Computer Science 2025-02-20 Shicong Cen , Jincheng Mei , Katayoon Goshvadi , Hanjun Dai , Tong Yang , Sherry Yang , Dale Schuurmans , Yuejie Chi , Bo Dai

Escaping Offline Pessimism: Vector-Field Reward Shaping for Safe Frontier Exploration

While offline reinforcement learning provides reliable policies for real-world deployment, its inherent pessimism severely restricts an agent's ability to explore and collect novel data online. Drawing inspiration from safe reinforcement…

Machine Learning · Computer Science 2026-03-20 Amirhossein Roknilamouki , Arnob Ghosh , Eylem Ekici , Ness B. Shroff

COMBO: Conservative Offline Model-Based Policy Optimization

Model-based algorithms, which learn a dynamics model from logged experience and perform some sort of pessimistic planning under the learned model, have emerged as a promising paradigm for offline reinforcement learning (offline RL).…

Machine Learning · Computer Science 2022-01-28 Tianhe Yu , Aviral Kumar , Rafael Rafailov , Aravind Rajeswaran , Sergey Levine , Chelsea Finn

Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline Reinforcement Learning

Dynamic mechanism design has garnered significant attention from both computer scientists and economists in recent years. By allowing agents to interact with the seller over multiple rounds, where agents' reward functions may change with…

Machine Learning · Computer Science 2022-06-22 Boxiang Lyu , Zhaoran Wang , Mladen Kolar , Zhuoran Yang

VIPeR: Visual Incremental Place Recognition with Adaptive Mining and Continual Learning

Visual place recognition (VPR) is an essential component of many autonomous and augmented/virtual reality systems. It enables the systems to robustly localize themselves in large-scale environments. Existing VPR methods demonstrate…

Computer Vision and Pattern Recognition · Computer Science 2025-02-13 Yuhang Ming , Minyang Xu , Xingrui Yang , Weicai Ye , Weihan Wang , Yong Peng , Weichen Dai , Wanzeng Kong