Related papers: Self-correcting Q-Learning

Deep Reinforcement Learning with Double Q-learning

The popular Q-learning algorithm is known to overestimate action values under certain conditions. It was not previously known whether, in practice, such overestimations are common, whether they harm performance, and whether they can…

Machine Learning · Computer Science 2015-12-10 Hado van Hasselt , Arthur Guez , David Silver

On the Estimation Bias in Double Q-Learning

Double Q-learning is a classical method for reducing overestimation bias, which is caused by taking maximum estimated values in the Bellman operation. Its variants in the deep Q-learning paradigm have shown great promise in producing…

Machine Learning · Computer Science 2022-01-17 Zhizhou Ren , Guangxiang Zhu , Hao Hu , Beining Han , Jianglun Chen , Chongjie Zhang

Cross Learning in Deep Q-Networks

In this work, we propose a novel cross Q-learning algorithm, aim at alleviating the well-known overestimation problem in value-based reinforcement learning methods, particularly in the deep Q-networks where the overestimation is exaggerated…

Artificial Intelligence · Computer Science 2020-09-30 Xing Wang , Alexander Vinel

Deep Double Q-learning

Double Q-learning is a classical control algorithm that mitigates the maximization bias of Q-learning. To do so, it explicitly trains two independent action-value functions and uses them to decouple action-selection and action-evaluation…

Machine Learning · Computer Science 2026-05-18 Prabhat Nagarajan , Martha White , Marlos C. Machado

Maxmin Q-learning: Controlling the Estimation Bias of Q-learning

Q-learning suffers from overestimation bias, because it approximates the maximum action value using the maximum estimated action value. Algorithms have been proposed to reduce overestimation bias, but we lack an understanding of how bias…

Machine Learning · Computer Science 2021-08-10 Qingfeng Lan , Yangchen Pan , Alona Fyshe , Martha White

ADDQ: Adaptive Distributional Double Q-Learning

Bias problems in the estimation of $Q$-values are a well-known obstacle that slows down convergence of $Q$-learning and actor-critic methods. One of the reasons of the success of modern RL algorithms is partially a direct or indirect…

Machine Learning · Computer Science 2025-06-26 Leif Döring , Benedikt Wille , Maximilian Birr , Mihail Bîrsan , Martin Slowik

Finite-Time Analysis of Simultaneous Double Q-learning

$Q$-learning is one of the most fundamental reinforcement learning (RL) algorithms. Despite its widespread success in various applications, it is prone to overestimation bias in the $Q$-learning update. To address this issue, double…

Machine Learning · Computer Science 2026-01-13 Hyunjun Na , Donghwan Lee

Decorrelated Double Q-learning

Q-learning with value function approximation may have the poor performance because of overestimation bias and imprecise estimate. Specifically, overestimation bias is from the maximum operator over noise estimate, which is exaggerated using…

Machine Learning · Computer Science 2020-06-15 Gang Chen

Ensemble Bootstrapping for Q-Learning

Q-learning (QL), a common reinforcement learning algorithm, suffers from over-estimation bias due to the maximization term in the optimal Bellman operator. This bias may lead to sub-optimal behavior. Double-Q-learning tackles this issue by…

Machine Learning · Computer Science 2021-04-21 Oren Peer , Chen Tessler , Nadav Merlis , Ron Meir

Using Deep Q-Learning to Control Optimization Hyperparameters

We present a novel definition of the reinforcement learning state, actions and reward function that allows a deep Q-network (DQN) to learn to control an optimization hyperparameter. Using Q-learning with experience replay, we train two DQNs…

Optimization and Control · Mathematics 2016-06-21 Samantha Hansen

Addressing Function Approximation Error in Actor-Critic Methods

In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated value estimates and suboptimal policies. We show that this problem persists in an actor-critic setting…

Artificial Intelligence · Computer Science 2018-10-23 Scott Fujimoto , Herke van Hoof , David Meger

Balanced Q-learning: Combining the Influence of Optimistic and Pessimistic Targets

The optimistic nature of the Q-learning target leads to an overestimation bias, which is an inherent problem associated with standard $Q-$learning. Such a bias fails to account for the possibility of low returns, particularly in risky…

Machine Learning · Computer Science 2021-11-05 Thommen George Karimpanal , Hung Le , Majid Abdolshah , Santu Rana , Sunil Gupta , Truyen Tran , Svetha Venkatesh

Modified Double DQN: addressing stability

Inspired by Double Q-learning algorithm, the Double-DQN (DDQN) algorithm was originally proposed in order to address the overestimation issue in the original DQN algorithm. The DDQN has successfully shown both theoretically and empirically…

Artificial Intelligence · Computer Science 2024-10-30 Shervin Halat , Mohammad Mehdi Ebadzadeh , Kiana Amani

Deep Reinforcement Learning with Weighted Q-Learning

Reinforcement learning algorithms based on Q-learning are driving Deep Reinforcement Learning (DRL) research towards solving complex problems and achieving super-human performance on many of them. Nevertheless, Q-Learning is known to be…

Machine Learning · Computer Science 2022-06-14 Andrea Cini , Carlo D'Eramo , Jan Peters , Cesare Alippi

Suppressing Overestimation in Q-Learning through Adversarial Behaviors

The goal of this paper is to propose a new Q-learning algorithm with a dummy adversarial player, which is called dummy adversarial Q-learning (DAQ), that can effectively regulate the overestimation bias in standard Q-learning. With the…

Machine Learning · Computer Science 2024-10-01 HyeAnn Lee , Donghwan Lee

Adapting Double Q-Learning for Continuous Reinforcement Learning

Majority of off-policy reinforcement learning algorithms use overestimation bias control techniques. Most of these techniques rooted in heuristics, primarily addressing the consequences of overestimation rather than its fundamental origins.…

Machine Learning · Computer Science 2023-09-27 Arsenii Kuznetsov

Two-Step Q-Learning

Q-learning is a stochastic approximation version of the classic value iteration. The literature has established that Q-learning suffers from both maximization bias and slower convergence. Recently, multi-step algorithms have shown practical…

Machine Learning · Computer Science 2024-07-03 Antony Vijesh , Shreyas S R

Action Candidate Based Clipped Double Q-learning for Discrete and Continuous Action Tasks

Double Q-learning is a popular reinforcement learning algorithm in Markov decision process (MDP) problems. Clipped Double Q-learning, as an effective variant of Double Q-learning, employs the clipped double estimator to approximate the…

Machine Learning · Computer Science 2021-05-04 Haobo Jiang , Jin Xie , Jian Yang

Smoothed Q-learning

In Reinforcement Learning the Q-learning algorithm provably converges to the optimal solution. However, as others have demonstrated, Q-learning can also overestimate the values and thereby spend too long exploring unhelpful states. Double…

Machine Learning · Computer Science 2023-03-16 David Barber

Action Candidate Driven Clipped Double Q-learning for Discrete and Continuous Action Tasks

Double Q-learning is a popular reinforcement learning algorithm in Markov decision process (MDP) problems. Clipped Double Q-learning, as an effective variant of Double Q-learning, employs the clipped double estimator to approximate the…

Machine Learning · Computer Science 2022-03-23 Haobo Jiang , Jin Xie , Jian Yang