English
Related papers

Related papers: Composite Q-learning: Multi-scale Q-function Decom…

200 papers

Q-Learning is a fundamental off-policy reinforcement learning (RL) algorithm that has the objective of approximating action-value functions in order to learn optimal policies. Nonetheless, it has difficulties in reconciling bias with…

Machine Learning · Computer Science 2024-11-22 Mahammad Humayoo

A key task in Artificial Intelligence is learning effective policies for controlling agents in unknown environments to optimize performance measures. Off-policy learning methods, like Q-learning, allow learners to make optimal decisions…

Artificial Intelligence · Computer Science 2025-09-10 Mingxuan Li , Junzhe Zhang , Elias Bareinboim

We propose and analyze an alternate approach to off-policy multi-step temporal difference learning, in which off-policy returns are corrected with the current Q-function in terms of rewards, rather than with the target policy in terms of…

Artificial Intelligence · Computer Science 2016-08-12 Anna Harutyunyan , Marc G. Bellemare , Tom Stepleton , Remi Munos

Model-free deep reinforcement learning has been shown to exhibit good performance in domains ranging from video games to simulated robotic manipulation and locomotion. However, model-free methods are known to perform poorly when the…

Machine Learning · Computer Science 2018-03-20 Tuomas Haarnoja , Vitchyr Pong , Aurick Zhou , Murtaza Dalal , Pieter Abbeel , Sergey Levine

In many real world applications, reinforcement learning agents have to optimize multiple objectives while following certain rules or satisfying a list of constraints. Classical methods based on reward shaping, i.e. a weighted combination of…

Machine Learning · Computer Science 2020-09-15 Gabriel Kalweit , Maria Huegle , Moritz Werling , Joschka Boedecker

Reinforcement learning in discrete-continuous hybrid action spaces presents fundamental challenges for robotic manipulation, where high-level task decisions and low-level joint-space execution must be jointly optimized. Existing approaches…

Robotics · Computer Science 2026-03-03 Thanh-Tuan Tran , Thanh Nguyen Canh , Nak Young Chong , Xiem HoangVan

A key task in Artificial Intelligence is learning effective policies for controlling agents in unknown environments to optimize performance measures. Off-policy learning methods, like Q-learning, allow learners to make optimal decisions…

Artificial Intelligence · Computer Science 2025-10-27 Mingxuan Li , Junzhe Zhang , Elias Bareinboim

Reinforcement learning holds the promise of enabling autonomous robots to learn large repertoires of behavioral skills with minimal human intervention. However, robotic applications of reinforcement learning often compromise the autonomy of…

Robotics · Computer Science 2016-11-24 Shixiang Gu , Ethan Holly , Timothy Lillicrap , Sergey Levine

By reusing data throughout training, off-policy deep reinforcement learning algorithms offer improved sample efficiency relative to on-policy approaches. For continuous action spaces, the most popular methods for off-policy learning include…

Machine Learning · Computer Science 2023-12-01 Jared Markowitz , Jesse Silverberg , Gary Collins

Q-learning played a foundational role in the field reinforcement learning (RL). However, TD algorithms with off-policy data, such as Q-learning, or nonlinear function approximation like deep neural networks require several additional tricks…

Machine Learning · Computer Science 2025-04-23 Matteo Gallici , Mattie Fellows , Benjamin Ellis , Bartomeu Pou , Ivan Masmitja , Jakob Nicolaus Foerster , Mario Martin

The dominant framework for off-policy multi-goal reinforcement learning involves estimating goal conditioned Q-value function. When learning to achieve multiple goals, data efficiency is intimately connected with the generalization of the…

Artificial Intelligence · Computer Science 2023-06-28 Zhang-Wei Hong , Ge Yang , Pulkit Agrawal

Many practical reinforcement learning environments have a discrete factored action space that induces a large combinatorial set of actions, thereby posing significant challenges. Existing approaches leverage the regular structure of the…

Machine Learning · Computer Science 2025-05-01 Junkyu Lee , Tian Gao , Elliot Nelson , Miao Liu , Debarun Bhattacharjya , Songtao Lu

Temporal-difference (TD) methods learn state and action values efficiently by bootstrapping from their own future value predictions, but such a self-bootstrapping mechanism is prone to bootstrapping bias, where the errors in the value…

Machine Learning · Computer Science 2025-12-15 Qiyang Li , Seohong Park , Sergey Levine

In this work, we present a scalable reinforcement learning method for training multi-task policies from large offline datasets that can leverage both human demonstrations and autonomously collected data. Our method uses a Transformer to…

Compared to on-policy counterparts, off-policy model-free deep reinforcement learning can improve data efficiency by repeatedly using the previously gathered data. However, off-policy learning becomes challenging when the discrepancy…

Machine Learning · Computer Science 2023-09-27 Baturay Saglam , Dogan C. Cicek , Furkan B. Mutlu , Suleyman S. Kozat

Applying Q-learning to high-dimensional or continuous action spaces can be difficult due to the required maximization over the set of possible actions. Motivated by techniques from amortized inference, we replace the expensive maximization…

Machine Learning · Computer Science 2020-01-23 Tom Van de Wiele , David Warde-Farley , Andriy Mnih , Volodymyr Mnih

Off-policy reinforcement learning suffers from extrapolation errors when a learned policy selects actions that are weakly supported in the replay buffer. In this study, we address this issue by drawing an analogy to static friction. From…

Machine Learning · Computer Science 2026-05-12 Hyunwoo Kim , Hyo Kyung Lee

The deep Q-network (DQN) and return-based reinforcement learning are two promising algorithms proposed in recent years. DQN brings advances to complex sequential decision problems, while return-based algorithms have advantages in making use…

Machine Learning · Computer Science 2019-12-02 Wenjia Meng , Qian Zheng , Long Yang , Pengfei Li , Gang Pan

We study deep reinforcement learning (RL) algorithms with delayed rewards. In many real-world tasks, instant rewards are often not readily accessible or even defined immediately after the agent performs actions. In this work, we first…

Machine Learning · Computer Science 2021-06-23 Beining Han , Zhizhou Ren , Zuofan Wu , Yuan Zhou , Jian Peng

Off-policy, value-based reinforcement learning methods such as Q-learning are appealing because they can learn from arbitrary experience, including data collected by older policies or other agents. In practice, however, bootstrapping makes…

Artificial Intelligence · Computer Science 2026-05-12 Armaan A. Abraham , Lucy Xiaoyang Shi , Chelsea Finn
‹ Prev 1 2 3 10 Next ›