Related papers: Offline Reinforcement Learning with Implicit Q-Lea…

Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization

Most offline reinforcement learning (RL) methods suffer from the trade-off between improving the policy to surpass the behavior policy and constraining the policy to limit the deviation from the behavior policy as computing $Q$-values using…

Machine Learning · Computer Science 2023-03-29 Haoran Xu , Li Jiang , Jianxiong Li , Zhuoran Yang , Zhaoran Wang , Victor Wai Kin Chan , Xianyuan Zhan

Boosting Offline Reinforcement Learning with Residual Generative Modeling

Offline reinforcement learning (RL) tries to learn the near-optimal policy with recorded offline experience without online exploration. Current offline RL research includes: 1) generative modeling, i.e., approximating a policy using fixed…

Machine Learning · Computer Science 2021-06-23 Hua Wei , Deheng Ye , Zhao Liu , Hao Wu , Bo Yuan , Qiang Fu , Wei Yang , Zhenhui Li

Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets

Offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data. The primary motivation for using reinforcement learning (RL) instead of supervised learning…

Machine Learning · Computer Science 2023-10-13 Zhang-Wei Hong , Aviral Kumar , Sathwik Karnik , Abhishek Bhandwaldar , Akash Srivastava , Joni Pajarinen , Romain Laroche , Abhishek Gupta , Pulkit Agrawal

Equivariant Offline Reinforcement Learning

Sample efficiency is critical when applying learning-based methods to robotic manipulation due to the high cost of collecting expert demonstrations and the challenges of on-robot policy learning through online Reinforcement Learning (RL).…

Machine Learning · Computer Science 2024-06-21 Arsh Tangri , Ondrej Biza , Dian Wang , David Klee , Owen Howell , Robert Platt

Offline Inverse Reinforcement Learning

The objective of offline RL is to learn optimal policies when a fixed exploratory demonstrations data-set is available and sampling additional observations is impossible (typically if this operation is either costly or rises ethical…

Machine Learning · Computer Science 2021-06-10 Firas Jarboui , Vianney Perchet

Conservative Q-Learning for Offline Reinforcement Learning

Effectively leveraging large, previously collected datasets in reinforcement learning (RL) is a key challenge for large-scale real-world applications. Offline RL algorithms promise to learn effective policies from previously-collected,…

Machine Learning · Computer Science 2020-08-20 Aviral Kumar , Aurick Zhou , George Tucker , Sergey Levine

Towards Robust Offline Reinforcement Learning under Diverse Data Corruption

Offline reinforcement learning (RL) presents a promising approach for learning reinforced policies from offline datasets without the need for costly or unsafe interactions with the environment. However, datasets collected by humans in…

Machine Learning · Computer Science 2024-03-12 Rui Yang , Han Zhong , Jiawei Xu , Amy Zhang , Chongjie Zhang , Lei Han , Tong Zhang

AlignIQL: Policy Alignment in Implicit Q-Learning through Constrained Optimization

Implicit Q-learning (IQL) serves as a strong baseline for offline RL, which learns the value function using only dataset actions through quantile regression. However, it is unclear how to recover the implicit policy from the learned…

Machine Learning · Computer Science 2025-11-06 Longxiang He , Li Shen , Xueqian Wang

Offline Reinforcement Learning with On-Policy Q-Function Regularization

The core challenge of offline reinforcement learning (RL) is dealing with the (potentially catastrophic) extrapolation error induced by the distribution shift between the history dataset and the desired policy. A large portion of prior work…

Machine Learning · Computer Science 2023-07-27 Laixi Shi , Robert Dadashi , Yuejie Chi , Pablo Samuel Castro , Matthieu Geist

PIQL: Projective Implicit Q-Learning with Support Constraint for Offline Reinforcement Learning

Offline Reinforcement Learning (RL) faces a fundamental challenge of extrapolation errors caused by out-of-distribution (OOD) actions. Implicit Q-Learning (IQL) employs expectile regression to achieve in-sample learning. Nevertheless, IQL…

Machine Learning · Computer Science 2026-02-03 Xinchen Han , Hossam Afifi , Michel Marot

Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble

Offline reinforcement learning (offline RL), which aims to find an optimal policy from a previously collected static dataset, bears algorithmic difficulties due to function approximation errors from out-of-distribution (OOD) data points. To…

Machine Learning · Computer Science 2021-10-06 Gaon An , Seungyong Moon , Jang-Hyun Kim , Hyun Oh Song

Offline Reinforcement Learning of High-Quality Behaviors Under Robust Style Alignment

We study offline reinforcement learning of style-conditioned policies using explicit style supervision via subtrajectory labeling functions. In this setting, aligning style with high task performance is particularly challenging due to…

Machine Learning · Computer Science 2026-02-02 Mathieu Petitbois , Rémy Portelas , Sylvain Lamprier

Pretraining a Shared Q-Network for Data-Efficient Offline Reinforcement Learning

Offline reinforcement learning (RL) aims to learn a policy from a static dataset without further interactions with the environment. Collecting sufficiently large datasets for offline RL is exhausting since this data collection requires…

Artificial Intelligence · Computer Science 2025-10-22 Jongchan Park , Mingyu Park , Donghwan Lee

Mildly Conservative Q-Learning for Offline Reinforcement Learning

Offline reinforcement learning (RL) defines the task of learning from a static logged dataset without continually interacting with the environment. The distribution shift between the learned policy and the behavior policy makes it necessary…

Machine Learning · Computer Science 2024-02-22 Jiafei Lyu , Xiaoteng Ma , Xiu Li , Zongqing Lu

BiCQL-ML: A Bi-Level Conservative Q-Learning Framework for Maximum Likelihood Inverse Reinforcement Learning

Offline inverse reinforcement learning (IRL) aims to recover a reward function that explains expert behavior using only fixed demonstration data, without any additional online interaction. We propose BiCQL-ML, a policy-free offline IRL…

Machine Learning · Computer Science 2025-12-01 Junsung Park

Evaluation-Time Policy Switching for Offline Reinforcement Learning

Offline reinforcement learning (RL) looks at learning how to optimally solve tasks using a fixed dataset of interactions from the environment. Many off-policy algorithms developed for online learning struggle in the offline setting as they…

Machine Learning · Computer Science 2025-03-18 Natinael Solomon Neggatu , Jeremie Houssineau , Giovanni Montana

Finetuning from Offline Reinforcement Learning: Challenges, Trade-offs and Practical Solutions

Offline reinforcement learning (RL) allows for the training of competent agents from offline datasets without any interaction with the environment. Online finetuning of such offline models can further improve performance. But how should we…

Machine Learning · Computer Science 2023-03-31 Yicheng Luo , Jackie Kay , Edward Grefenstette , Marc Peter Deisenroth

Statistically Efficient Advantage Learning for Offline Reinforcement Learning in Infinite Horizons

We consider reinforcement learning (RL) methods in offline domains without additional online data collection, such as mobile health applications. Most of existing policy optimization algorithms in the computer science literature are…

Machine Learning · Statistics 2022-07-28 Chengchun Shi , Shikai Luo , Yuan Le , Hongtu Zhu , Rui Song

Efficient Offline Reinforcement Learning: First Imitate, then Improve

Supervised imitation-based approaches are often favored over off-policy reinforcement learning approaches for learning policies offline, since their straightforward optimization objective makes them computationally efficient and stable to…

Machine Learning · Computer Science 2025-12-30 Adam Jelley , Trevor McInroe , Sam Devlin , Amos Storkey

IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies

Effective offline RL methods require properly handling out-of-distribution actions. Implicit Q-learning (IQL) addresses this by training a Q-function using only dataset actions through a modified Bellman backup. However, it is unclear which…

Machine Learning · Computer Science 2023-05-23 Philippe Hansen-Estruch , Ilya Kostrikov , Michael Janner , Jakub Grudzien Kuba , Sergey Levine