Related papers: Strategically Conservative Q-Learning

Mildly Conservative Q-Learning for Offline Reinforcement Learning

Offline reinforcement learning (RL) defines the task of learning from a static logged dataset without continually interacting with the environment. The distribution shift between the learned policy and the behavior policy makes it necessary…

Machine Learning · Computer Science 2024-02-22 Jiafei Lyu , Xiaoteng Ma , Xiu Li , Zongqing Lu

Conservative Q-Learning for Offline Reinforcement Learning

Effectively leveraging large, previously collected datasets in reinforcement learning (RL) is a key challenge for large-scale real-world applications. Offline RL algorithms promise to learn effective policies from previously-collected,…

Machine Learning · Computer Science 2020-08-20 Aviral Kumar , Aurick Zhou , George Tucker , Sergey Levine

Mildly Conservative Regularized Evaluation for Offline Reinforcement Learning

Offline reinforcement learning (RL) seeks to learn optimal policies from static datasets without further environment interaction. A key challenge is the distribution shift between the learned and behavior policies, leading to…

Machine Learning · Computer Science 2025-08-11 Haohui Chen , Zhiyong Chen

RankQ: Offline-to-Online Reinforcement Learning via Self-Supervised Action Ranking

Offline-to-online reinforcement learning (RL) improves sample efficiency by leveraging pre-collected datasets prior to online interaction. A key challenge, however, is learning an accurate critic in large state--action spaces with limited…

Artificial Intelligence · Computer Science 2026-05-21 Andrew Choi , Wei Xu

Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement Learning

Offline multi-agent reinforcement learning is challenging due to the coupling effect of both distribution shift issue common in offline setting and the high dimension issue common in multi-agent setting, making the action…

Artificial Intelligence · Computer Science 2023-09-25 Jianzhun Shao , Yun Qu , Chen Chen , Hongchang Zhang , Xiangyang Ji

Selecting Offline Reinforcement Learning Algorithms for Stochastic Network Control

Offline Reinforcement Learning (RL) is a promising approach for next-generation wireless networks, where online exploration is unsafe and large amounts of operational data can be reused across the model lifecycle. However, the behavior of…

Networking and Internet Architecture · Computer Science 2026-03-05 Nicolas Helson , Pegah Alizadeh , Anastasios Giovanidis

Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble

Offline reinforcement learning (offline RL), which aims to find an optimal policy from a previously collected static dataset, bears algorithmic difficulties due to function approximation errors from out-of-distribution (OOD) data points. To…

Machine Learning · Computer Science 2021-10-06 Gaon An , Seungyong Moon , Jang-Hyun Kim , Hyun Oh Song

Equivariant Offline Reinforcement Learning

Sample efficiency is critical when applying learning-based methods to robotic manipulation due to the high cost of collecting expert demonstrations and the challenges of on-robot policy learning through online Reinforcement Learning (RL).…

Machine Learning · Computer Science 2024-06-21 Arsh Tangri , Ondrej Biza , Dian Wang , David Klee , Owen Howell , Robert Platt

Model-based Offline Reinforcement Learning with Lower Expectile Q-Learning

Model-based offline reinforcement learning (RL) is a compelling approach that addresses the challenge of learning from limited, static data by generating imaginary trajectories using learned models. However, these approaches often struggle…

Machine Learning · Computer Science 2024-12-04 Kwanyoung Park , Youngwoon Lee

Offline Guarded Safe Reinforcement Learning for Medical Treatment Optimization Strategies

When applying offline reinforcement learning (RL) in healthcare scenarios, the out-of-distribution (OOD) issues pose significant risks, as inappropriate generalization beyond clinical expertise can result in potentially harmful…

Machine Learning · Computer Science 2025-05-23 Runze Yan , Xun Shen , Akifumi Wachi , Sebastien Gros , Anni Zhao , Xiao Hu

State-Constrained Offline Reinforcement Learning

Traditional offline reinforcement learning (RL) methods predominantly operate in a batch-constrained setting. This confines the algorithms to a specific state-action distribution present in the dataset, reducing the effects of…

Machine Learning · Statistics 2025-07-16 Charles A. Hepburn , Yue Jin , Giovanni Montana

Constraints Penalized Q-learning for Safe Offline Reinforcement Learning

We study the problem of safe offline reinforcement learning (RL), the goal is to learn a policy that maximizes long-term reward while satisfying safety constraints given only offline data, without further interaction with the environment.…

Machine Learning · Computer Science 2022-04-11 Haoran Xu , Xianyuan Zhan , Xiangyu Zhu

ACL-QL: Adaptive Conservative Level in Q-Learning for Offline Reinforcement Learning

Offline Reinforcement Learning (RL), which operates solely on static datasets without further interactions with the environment, provides an appealing alternative to learning a safe and promising control policy. The prevailing methods…

Machine Learning · Computer Science 2025-03-18 Kun Wu , Yinuo Zhao , Zhiyuan Xu , Zhengping Che , Chengxiang Yin , Chi Harold Liu , Feiferi Feng , Jian Tang

Imagination-Limited Q-Learning for Offline Reinforcement Learning

Offline reinforcement learning seeks to derive improved policies entirely from historical data but often struggles with over-optimistic value estimates for out-of-distribution (OOD) actions. This issue is typically mitigated via policy…

Machine Learning · Computer Science 2025-05-20 Wenhui Liu , Zhijian Wu , Jingchao Wang , Dingjiang Huang , Shuigeng Zhou

Off-Policy Safe Reinforcement Learning with Constrained Optimistic Exploration

When safety is formulated as a limit of cumulative cost, safe reinforcement learning (RL) aims to learn policies that maximize return subject to the cost constraint in data collection and deployment. Off-policy safe RL methods, although…

Machine Learning · Computer Science 2026-03-26 Guopeng Li , Matthijs T. J. Spaan , Julian F. P. Kooij

Offline Reinforcement Learning with Implicit Q-Learning

Offline reinforcement learning requires reconciling two conflicting aims: learning a policy that improves over the behavior policy that collected the dataset, while at the same time minimizing the deviation from the behavior policy so as to…

Machine Learning · Computer Science 2021-10-13 Ilya Kostrikov , Ashvin Nair , Sergey Levine

RORL: Robust Offline Reinforcement Learning via Conservative Smoothing

Offline reinforcement learning (RL) provides a promising direction to exploit massive amount of offline data for complex decision-making tasks. Due to the distribution shift issue, current offline RL algorithms are generally designed to be…

Machine Learning · Computer Science 2022-10-25 Rui Yang , Chenjia Bai , Xiaoteng Ma , Zhaoran Wang , Chongjie Zhang , Lei Han

Reducing Conservativeness Oriented Offline Reinforcement Learning

In offline reinforcement learning, a policy learns to maximize cumulative rewards with a fixed collection of data. Towards conservative strategy, current methods choose to regularize the behavior policy or learn a lower bound of the value…

Machine Learning · Computer Science 2021-03-02 Hongchang Zhang , Jianzhun Shao , Yuhang Jiang , Shuncheng He , Xiangyang Ji

Contextual Conservative Q-Learning for Offline Reinforcement Learning

Offline reinforcement learning learns an effective policy on offline datasets without online interaction, and it attracts persistent research attention due to its potential of practical application. However, extrapolation error generated by…

Machine Learning · Computer Science 2023-01-18 Ke Jiang , Jiayu Yao , Xiaoyang Tan

Q-Distribution guided Q-learning for offline reinforcement learning: Uncertainty penalized Q-value via consistency model

``Distribution shift'' is the main obstacle to the success of offline reinforcement learning. A learning policy may take actions beyond the behavior policy's knowledge, referred to as Out-of-Distribution (OOD) actions. The Q-values for…

Machine Learning · Computer Science 2025-01-14 Jing Zhang , Linjiajie Fang , Kexin Shi , Wenjia Wang , Bing-Yi Jing