Related papers: Composite Q-learning: Multi-scale Q-function Decom…

Time-Scale Separation in Q-Learning: Extending TD($\triangle$) for Action-Value Function Decomposition

Q-Learning is a fundamental off-policy reinforcement learning (RL) algorithm that has the objective of approximating action-value functions in order to learn optimal policies. Nonetheless, it has difficulties in reconciling bias with…

Machine Learning · Computer Science 2024-11-22 Mahammad Humayoo

Automatic Reward Shaping from Confounded Offline Data

A key task in Artificial Intelligence is learning effective policies for controlling agents in unknown environments to optimize performance measures. Off-policy learning methods, like Q-learning, allow learners to make optimal decisions…

Artificial Intelligence · Computer Science 2025-09-10 Mingxuan Li , Junzhe Zhang , Elias Bareinboim

Q($\lambda$) with Off-Policy Corrections

We propose and analyze an alternate approach to off-policy multi-step temporal difference learning, in which off-policy returns are corrected with the current Q-function in terms of rewards, rather than with the target policy in terms of…

Artificial Intelligence · Computer Science 2016-08-12 Anna Harutyunyan , Marc G. Bellemare , Tom Stepleton , Remi Munos

Composable Deep Reinforcement Learning for Robotic Manipulation

Model-free deep reinforcement learning has been shown to exhibit good performance in domains ranging from video games to simulated robotic manipulation and locomotion. However, model-free methods are known to perform poorly when the…

Machine Learning · Computer Science 2018-03-20 Tuomas Haarnoja , Vitchyr Pong , Aurick Zhou , Murtaza Dalal , Pieter Abbeel , Sergey Levine

Deep Constrained Q-learning

In many real world applications, reinforcement learning agents have to optimize multiple objectives while following certain rules or satisfying a list of constraints. Classical methods based on reward shaping, i.e. a weighted combination of…

Machine Learning · Computer Science 2020-09-15 Gabriel Kalweit , Maria Huegle , Moritz Werling , Joschka Boedecker

Hybrid TD3: Overestimation Bias Analysis and Stable Policy Optimization for Hybrid Action Space

Reinforcement learning in discrete-continuous hybrid action spaces presents fundamental challenges for robotic manipulation, where high-level task decisions and low-level joint-space execution must be jointly optimized. Existing approaches…

Robotics · Computer Science 2026-03-03 Thanh-Tuan Tran , Thanh Nguyen Canh , Nak Young Chong , Xiem HoangVan

Confounding Robust Deep Reinforcement Learning: A Causal Approach

A key task in Artificial Intelligence is learning effective policies for controlling agents in unknown environments to optimize performance measures. Off-policy learning methods, like Q-learning, allow learners to make optimal decisions…

Artificial Intelligence · Computer Science 2025-10-27 Mingxuan Li , Junzhe Zhang , Elias Bareinboim

Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates

Reinforcement learning holds the promise of enabling autonomous robots to learn large repertoires of behavioral skills with minimal human intervention. However, robotic applications of reinforcement learning often compromise the autonomy of…

Robotics · Computer Science 2016-11-24 Shixiang Gu , Ethan Holly , Timothy Lillicrap , Sergey Levine

Handling Cost and Constraints with Off-Policy Deep Reinforcement Learning

By reusing data throughout training, off-policy deep reinforcement learning algorithms offer improved sample efficiency relative to on-policy approaches. For continuous action spaces, the most popular methods for off-policy learning include…

Machine Learning · Computer Science 2023-12-01 Jared Markowitz , Jesse Silverberg , Gary Collins

Simplifying Deep Temporal Difference Learning

Q-learning played a foundational role in the field reinforcement learning (RL). However, TD algorithms with off-policy data, such as Q-learning, or nonlinear function approximation like deep neural networks require several additional tricks…

Machine Learning · Computer Science 2025-04-23 Matteo Gallici , Mattie Fellows , Benjamin Ellis , Bartomeu Pou , Ivan Masmitja , Jakob Nicolaus Foerster , Mario Martin

Bilinear value networks

The dominant framework for off-policy multi-goal reinforcement learning involves estimating goal conditioned Q-value function. When learning to achieve multiple goals, data efficiency is intimately connected with the generalization of the…

Artificial Intelligence · Computer Science 2023-06-28 Zhang-Wei Hong , Ge Yang , Pulkit Agrawal

Q-function Decomposition with Intervention Semantics with Factored Action Spaces

Many practical reinforcement learning environments have a discrete factored action space that induces a large combinatorial set of actions, thereby posing significant challenges. Existing approaches leverage the regular structure of the…

Machine Learning · Computer Science 2025-05-01 Junkyu Lee , Tian Gao , Elliot Nelson , Miao Liu , Debarun Bhattacharjya , Songtao Lu

Decoupled Q-Chunking

Temporal-difference (TD) methods learn state and action values efficiently by bootstrapping from their own future value predictions, but such a self-bootstrapping mechanism is prone to bootstrapping bias, where the errors in the value…

Machine Learning · Computer Science 2025-12-15 Qiyang Li , Seohong Park , Sergey Levine

Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions

In this work, we present a scalable reinforcement learning method for training multi-task policies from large offline datasets that can leverage both human demonstrations and autonomously collected data. Our method uses a Transformer to…

Robotics · Computer Science 2023-10-18 Yevgen Chebotar , Quan Vuong , Alex Irpan , Karol Hausman , Fei Xia , Yao Lu , Aviral Kumar , Tianhe Yu , Alexander Herzog , Karl Pertsch , Keerthana Gopalakrishnan , Julian Ibarz , Ofir Nachum , Sumedh Sontakke , Grecia Salazar , Huong T Tran , Jodilyn Peralta , Clayton Tan , Deeksha Manjunath , Jaspiar Singht , Brianna Zitkovich , Tomas Jackson , Kanishka Rao , Chelsea Finn , Sergey Levine

Mitigating Off-Policy Bias in Actor-Critic Methods with One-Step Q-learning: A Novel Correction Approach

Compared to on-policy counterparts, off-policy model-free deep reinforcement learning can improve data efficiency by repeatedly using the previously gathered data. However, off-policy learning becomes challenging when the discrepancy…

Machine Learning · Computer Science 2023-09-27 Baturay Saglam , Dogan C. Cicek , Furkan B. Mutlu , Suleyman S. Kozat

Q-Learning in enormous action spaces via amortized approximate maximization

Applying Q-learning to high-dimensional or continuous action spaces can be difficult due to the required maximization over the set of possible actions. Motivated by techniques from amortized inference, we replace the expensive maximization…

Machine Learning · Computer Science 2020-01-23 Tom Van de Wiele , David Warde-Farley , Andriy Mnih , Volodymyr Mnih

Frictional Q-Learning

Off-policy reinforcement learning suffers from extrapolation errors when a learned policy selects actions that are weakly supported in the replay buffer. In this study, we address this issue by drawing an analogy to static friction. From…

Machine Learning · Computer Science 2026-05-12 Hyunwoo Kim , Hyo Kyung Lee

Qualitative Measurements of Policy Discrepancy for Return-Based Deep Q-Network

The deep Q-network (DQN) and return-based reinforcement learning are two promising algorithms proposed in recent years. DQN brings advances to complex sequential decision problems, while return-based algorithms have advantages in making use…

Machine Learning · Computer Science 2019-12-02 Wenjia Meng , Qian Zheng , Long Yang , Pengfei Li , Gang Pan

Off-Policy Reinforcement Learning with Delayed Rewards

We study deep reinforcement learning (RL) algorithms with delayed rewards. In many real-world tasks, instant rewards are often not readily accessible or even defined immediately after the agent performs actions. In this work, we first…

Machine Learning · Computer Science 2021-06-23 Beining Han , Zhizhou Ren , Zuofan Wu , Yuan Zhou , Jian Peng

Long-Horizon Q-Learning: Accurate Value Learning via n-Step Inequalities

Off-policy, value-based reinforcement learning methods such as Q-learning are appealing because they can learn from arbitrary experience, including data collected by older policies or other agents. In practice, however, bootstrapping makes…

Artificial Intelligence · Computer Science 2026-05-12 Armaan A. Abraham , Lucy Xiaoyang Shi , Chelsea Finn