Related papers: Simplifying Deep Temporal Difference Learning

Qgraph-bounded Q-learning: Stabilizing Model-Free Off-Policy Deep Reinforcement Learning

In state of the art model-free off-policy deep reinforcement learning, a replay memory is used to store past experience and derive all network updates. Even if both state and action spaces are continuous, the replay memory only holds a…

Machine Learning · Computer Science 2020-07-16 Sabrina Hoppe , Marc Toussaint

Stratified Experience Replay: Correcting Multiplicity Bias in Off-Policy Reinforcement Learning

Deep Reinforcement Learning (RL) methods rely on experience replay to approximate the minibatched supervised learning setting; however, unlike supervised learning where access to lots of training data is crucial to generalization,…

Machine Learning · Computer Science 2021-02-24 Brett Daley , Cameron Hickert , Christopher Amato

Efficient Off-Policy Reinforcement Learning via Brain-Inspired Computing

Reinforcement Learning (RL) has opened up new opportunities to enhance existing smart systems that generally include a complex decision-making process. However, modern RL algorithms, e.g., Deep Q-Networks (DQN), are based on deep neural…

Machine Learning · Computer Science 2023-06-22 Yang Ni , Danny Abraham , Mariam Issa , Yeseong Kim , Pietro Mercati , Mohsen Imani

An Optimistic Perspective on Offline Reinforcement Learning

Off-policy reinforcement learning (RL) using a fixed offline dataset of logged interactions is an important consideration in real world applications. This paper studies offline RL using the DQN replay dataset comprising the entire replay…

Machine Learning · Computer Science 2020-11-25 Rishabh Agarwal , Dale Schuurmans , Mohammad Norouzi

Time-Aware Q-Networks: Resolving Temporal Irregularity for Deep Reinforcement Learning

Deep Reinforcement Learning (DRL) has shown outstanding performance on inducing effective action policies that maximize expected long-term return on many complex tasks. Much of DRL work has been focused on sequences of events with discrete…

Machine Learning · Computer Science 2021-05-07 Yeo Jin Kim , Min Chi

Faster Deep Reinforcement Learning with Slower Online Network

Deep reinforcement learning algorithms often use two networks for value function optimization: an online network, and a target network that tracks the online network with some delay. Using two separate networks enables the agent to hedge…

Machine Learning · Computer Science 2023-04-19 Kavosh Asadi , Rasool Fakoor , Omer Gottesman , Taesup Kim , Michael L. Littman , Alexander J. Smola

Offline Reinforcement Learning with On-Policy Q-Function Regularization

The core challenge of offline reinforcement learning (RL) is dealing with the (potentially catastrophic) extrapolation error induced by the distribution shift between the history dataset and the desired policy. A large portion of prior work…

Machine Learning · Computer Science 2023-07-27 Laixi Shi , Robert Dadashi , Yuejie Chi , Pablo Samuel Castro , Matthieu Geist

Distillation Strategies for Proximal Policy Optimization

Vision-based deep reinforcement learning (RL) typically obtains performance benefit by using high capacity and relatively large convolutional neural networks (CNN). However, a large network leads to higher inference costs (power, latency,…

Machine Learning · Computer Science 2019-05-01 Sam Green , Craig M. Vineyard , Çetin Kaya Koç

Automatic Reward Shaping from Confounded Offline Data

A key task in Artificial Intelligence is learning effective policies for controlling agents in unknown environments to optimize performance measures. Off-policy learning methods, like Q-learning, allow learners to make optimal decisions…

Artificial Intelligence · Computer Science 2025-09-10 Mingxuan Li , Junzhe Zhang , Elias Bareinboim

Stabilizing Q-learning with Linear Architectures for Provably Efficient Learning

The $Q$-learning algorithm is a simple and widely-used stochastic approximation scheme for reinforcement learning, but the basic protocol can exhibit instability in conjunction with function approximation. Such instability can be observed…

Machine Learning · Computer Science 2022-06-03 Andrea Zanette , Martin J. Wainwright

Pretraining a Shared Q-Network for Data-Efficient Offline Reinforcement Learning

Offline reinforcement learning (RL) aims to learn a policy from a static dataset without further interactions with the environment. Collecting sufficiently large datasets for offline RL is exhausting since this data collection requires…

Artificial Intelligence · Computer Science 2025-10-22 Jongchan Park , Mingyu Park , Donghwan Lee

Enabling Off-Policy Imitation Learning with Deep Actor Critic Stabilization

Learning complex policies with Reinforcement Learning (RL) is often hindered by instability and slow convergence, a problem exacerbated by the difficulty of reward engineering. Imitation Learning (IL) from expert demonstrations bypasses…

Machine Learning · Computer Science 2026-05-19 Sayambhu Sen , Shalabh Bhatnagar

A Convergent Off-Policy Temporal Difference Algorithm

Learning the value function of a given policy (target policy) from the data samples obtained from a different policy (behavior policy) is an important problem in Reinforcement Learning (RL). This problem is studied under the setting of…

Machine Learning · Computer Science 2019-11-14 Raghuram Bharadwaj Diddigi , Chandramouli Kamanchi , Shalabh Bhatnagar

Off-Policy Reinforcement Learning with Loss Function Weighted by Temporal Difference Error

Training agents via off-policy deep reinforcement learning (RL) requires a large memory, named replay memory, that stores past experiences used for learning. These experiences are sampled, uniformly or non-uniformly, to create the batches…

Machine Learning · Computer Science 2022-12-27 Bumgeun Park , Taeyoung Kim , Woohyeon Moon , Luiz Felipe Vecchietti , Dongsoo Har

Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation

While agents trained by Reinforcement Learning (RL) can solve increasingly challenging tasks directly from visual observations, generalizing learned skills to novel environments remains very challenging. Extensive use of data augmentation…

Machine Learning · Computer Science 2021-12-10 Nicklas Hansen , Hao Su , Xiaolong Wang

Uncovering Instabilities in Variational-Quantum Deep Q-Networks

Deep Reinforcement Learning (RL) has considerably advanced over the past decade. At the same time, state-of-the-art RL algorithms require a large computational budget in terms of training time to converge. Recent work has started to…

Quantum Physics · Physics 2022-09-19 Maja Franz , Lucas Wolf , Maniraman Periyasamy , Christian Ufrecht , Daniel D. Scherer , Axel Plinge , Christopher Mutschler , Wolfgang Mauerer

Memory-efficient Reinforcement Learning with Value-based Knowledge Consolidation

Artificial neural networks are promising for general function approximation but challenging to train on non-independent or non-identically distributed data due to catastrophic forgetting. The experience replay buffer, a standard component…

Machine Learning · Computer Science 2023-04-12 Qingfeng Lan , Yangchen Pan , Jun Luo , A. Rupam Mahmood

Quantum deep Q learning with distributed prioritized experience replay

This paper introduces the QDQN-DPER framework to enhance the efficiency of quantum reinforcement learning (QRL) in solving sequential decision tasks. The framework incorporates prioritized experience replay and asynchronous training into…

Quantum Physics · Physics 2023-04-20 Samuel Yen-Chi Chen

Offline Q-Learning on Diverse Multi-Task Data Both Scales And Generalizes

The potential of offline reinforcement learning (RL) is that high-capacity models trained on large, heterogeneous datasets can lead to agents that generalize broadly, analogously to similar advances in vision and NLP. However, recent works…

Machine Learning · Computer Science 2023-04-19 Aviral Kumar , Rishabh Agarwal , Xinyang Geng , George Tucker , Sergey Levine

Periodic Q-Learning

The use of target networks is a common practice in deep reinforcement learning for stabilizing the training; however, theoretical understanding of this technique is still limited. In this paper, we study the so-called periodic Q-learning…

Machine Learning · Computer Science 2020-02-25 Donghwan Lee , Niao He