Deep Double Q-learning

Prabhat Nagarajan; Martha White; Marlos C. Machado

Deep Double Q-learning

Machine Learning 2026-05-18 v2 Artificial Intelligence

Authors: Prabhat Nagarajan , Martha White , Marlos C. Machado

Abstract

Double Q-learning is a classical control algorithm that mitigates the maximization bias of Q-learning. To do so, it explicitly trains two independent action-value functions and uses them to decouple action-selection and action-evaluation when computing bootstrap targets. Double DQN adapts target bootstrap decoupling to deep reinforcement learning (RL), but explicitly trains only a single action-value function and does not fully decouple its estimators. Consequently, the two estimators remain correlated, and overestimation persists. In this paper, we introduce Deep Double Q-learning (DDQL), a deep RL algorithm that explicitly trains two Q-functions through Double Q-learning. DDQL stabilizes training through a combination of techniques, including lower replay ratios, longer target network update intervals, and shared layers. Across 57 Atari 2600 games, DDQL improves aggregate performance over Double DQN, outperforming it on 47 games while further reducing overestimation. In addition, we study key design choices when adapting Double Q-learning to deep RL, including the network architecture, replay ratio, and minibatch sampling strategies.

Keywords

reinforcement learning

Cite

@article{arxiv.2507.00275,
  title  = {Deep Double Q-learning},
  author = {Prabhat Nagarajan and Martha White and Marlos C. Machado},
  journal= {arXiv preprint arXiv:2507.00275},
  year   = {2026}
}

Comments

44 pages

Deep Double Q-learning

Abstract

Keywords

Cite

Comments

Related papers