Related papers: Implicitly Regularized RL with Implicit Q-Values

Quinoa: a Q-function You Infer Normalized Over Actions

We present an algorithm for learning an approximate action-value soft Q-function in the relative entropy regularised reinforcement learning setting, for which an optimal improved policy can be recovered in closed form. We use recent…

Machine Learning · Computer Science 2019-11-06 Jonas Degrave , Abbas Abdolmaleki , Jost Tobias Springenberg , Nicolas Heess , Martin Riedmiller

Offline Reinforcement Learning with On-Policy Q-Function Regularization

The core challenge of offline reinforcement learning (RL) is dealing with the (potentially catastrophic) extrapolation error induced by the distribution shift between the history dataset and the desired policy. A large portion of prior work…

Machine Learning · Computer Science 2023-07-27 Laixi Shi , Robert Dadashi , Yuejie Chi , Pablo Samuel Castro , Matthieu Geist

Offline Reinforcement Learning with Implicit Q-Learning

Offline reinforcement learning requires reconciling two conflicting aims: learning a policy that improves over the behavior policy that collected the dataset, while at the same time minimizing the deviation from the behavior policy so as to…

Machine Learning · Computer Science 2021-10-13 Ilya Kostrikov , Ashvin Nair , Sergey Levine

Regularized Softmax Deep Multi-Agent $Q$-Learning

Tackling overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning, but has received comparatively little attention in the multi-agent setting. In this work, we…

Machine Learning · Computer Science 2021-06-14 Ling Pan , Tabish Rashid , Bei Peng , Longbo Huang , Shimon Whiteson

Q-Policy: Quantum-Enhanced Policy Evaluation for Scalable Reinforcement Learning

We propose Q-Policy, a hybrid quantum-classical reinforcement learning (RL) framework that mathematically accelerates policy evaluation and optimization by exploiting quantum computing primitives. Q-Policy encodes value functions in quantum…

Machine Learning · Computer Science 2025-06-10 Kalyan Cherukuri , Aarav Lala , Yash Yardi

On the Convergence of Approximate and Regularized Policy Iteration Schemes

Entropy regularized algorithms such as Soft Q-learning and Soft Actor-Critic, recently showed state-of-the-art performance on a number of challenging reinforcement learning (RL) tasks. The regularized formulation modifies the standard RL…

Machine Learning · Statistics 2019-10-15 Elena Smirnova , Elvis Dohmatob

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

Model-free deep reinforcement learning (RL) algorithms have been demonstrated on a range of challenging decision making and control tasks. However, these methods typically suffer from two major challenges: very high sample complexity and…

Machine Learning · Computer Science 2018-08-10 Tuomas Haarnoja , Aurick Zhou , Pieter Abbeel , Sergey Levine

Deep Reinforcement Learning with Adjustments

Deep reinforcement learning (RL) algorithms can learn complex policies to optimize agent operation over time. RL algorithms have shown promising results in solving complicated problems in recent years. However, their application on…

Machine Learning · Computer Science 2021-09-29 Hamed Khorasgani , Haiyan Wang , Chetan Gupta , Susumu Serita

Leveraging Factored Action Spaces for Efficient Offline Reinforcement Learning in Healthcare

Many reinforcement learning (RL) applications have combinatorial action spaces, where each action is a composition of sub-actions. A standard RL approach ignores this inherent factorization structure, resulting in a potential failure to…

Machine Learning · Computer Science 2023-05-04 Shengpu Tang , Maggie Makar , Michael W. Sjoding , Finale Doshi-Velez , Jenna Wiens

Conservative Q-Learning for Offline Reinforcement Learning

Effectively leveraging large, previously collected datasets in reinforcement learning (RL) is a key challenge for large-scale real-world applications. Offline RL algorithms promise to learn effective policies from previously-collected,…

Machine Learning · Computer Science 2020-08-20 Aviral Kumar , Aurick Zhou , George Tucker , Sergey Levine

Bridging the Gap Between Value and Policy Based Reinforcement Learning

We establish a new connection between value and policy based reinforcement learning (RL) based on a relationship between softmax temporal value consistency and policy optimality under entropy regularization. Specifically, we show that…

Artificial Intelligence · Computer Science 2017-11-27 Ofir Nachum , Mohammad Norouzi , Kelvin Xu , Dale Schuurmans

IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies

Effective offline RL methods require properly handling out-of-distribution actions. Implicit Q-learning (IQL) addresses this by training a Q-function using only dataset actions through a modified Bellman backup. However, it is unclear which…

Machine Learning · Computer Science 2023-05-23 Philippe Hansen-Estruch , Ilya Kostrikov , Michael Janner , Jakub Grudzien Kuba , Sergey Levine

Revisiting the Softmax Bellman Operator: New Benefits and New Perspective

The impact of softmax on the value function itself in reinforcement learning (RL) is often viewed as problematic because it leads to sub-optimal value (or Q) functions and interferes with the contraction properties of the Bellman operator.…

Machine Learning · Computer Science 2019-05-21 Zhao Song , Ronald E. Parr , Lawrence Carin

Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning

Learning from datasets without interaction with environments (Offline Learning) is an essential step to apply Reinforcement Learning (RL) algorithms in real-world scenarios. However, compared with the single-agent counterpart, offline…

Artificial Intelligence · Computer Science 2021-10-27 Yiqin Yang , Xiaoteng Ma , Chenghao Li , Zewu Zheng , Qiyuan Zhang , Gao Huang , Jun Yang , Qianchuan Zhao

Off-Policy Reinforcement Learning with Delayed Rewards

We study deep reinforcement learning (RL) algorithms with delayed rewards. In many real-world tasks, instant rewards are often not readily accessible or even defined immediately after the agent performs actions. In this work, we first…

Machine Learning · Computer Science 2021-06-23 Beining Han , Zhizhou Ren , Zuofan Wu , Yuan Zhou , Jian Peng

LS-IQ: Implicit Reward Regularization for Inverse Reinforcement Learning

Recent methods for imitation learning directly learn a $Q$-function using an implicit reward formulation rather than an explicit reward function. However, these methods generally require implicit reward regularization to improve stability…

Machine Learning · Computer Science 2023-03-02 Firas Al-Hafez , Davide Tateo , Oleg Arenz , Guoping Zhao , Jan Peters

Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage

In offline reinforcement learning (RL) we have no opportunity to explore so we must make assumptions that the data is sufficient to guide picking a good policy, taking the form of assuming some coverage, realizability, Bellman completeness,…

Machine Learning · Computer Science 2023-11-14 Masatoshi Uehara , Nathan Kallus , Jason D. Lee , Wen Sun

AlignIQL: Policy Alignment in Implicit Q-Learning through Constrained Optimization

Implicit Q-learning (IQL) serves as a strong baseline for offline RL, which learns the value function using only dataset actions through quantile regression. However, it is unclear how to recover the implicit policy from the learned…

Machine Learning · Computer Science 2025-11-06 Longxiang He , Li Shen , Xueqian Wang

Offline RL Without Off-Policy Evaluation

Most prior approaches to offline reinforcement learning (RL) have taken an iterative actor-critic approach involving off-policy evaluation. In this paper we show that simply doing one step of constrained/regularized policy improvement using…

Machine Learning · Computer Science 2021-12-06 David Brandfonbrener , William F. Whitney , Rajesh Ranganath , Joan Bruna

Soft Q Network

Deep Q Network (DQN) is a very successful algorithm, yet the inherent problem of reinforcement learning, i.e. the exploit-explore balance, remains. In this work, we introduce entropy regularization into DQN and propose SQN. We find that the…

Machine Learning · Computer Science 2020-12-15 Jingbin Liu , Shuai Liu , Xinyang Gu