Related papers: Two-Step Q-Learning

A Multi-Step Minimax Q-learning Algorithm for Two-Player Zero-Sum Markov Games

An interesting iterative procedure is proposed to solve a two-player zero-sum Markov games. Under suitable assumption, the boundedness of the proposed iterates is obtained theoretically. Using results from stochastic approximation, the…

Machine Learning · Computer Science 2025-09-23 Shreyas S R , Antony Vijesh

Smoothed Q-learning

In Reinforcement Learning the Q-learning algorithm provably converges to the optimal solution. However, as others have demonstrated, Q-learning can also overestimate the values and thereby spend too long exploring unhelpful states. Double…

Machine Learning · Computer Science 2023-03-16 David Barber

Finite-Time Analysis for Double Q-learning

Although Q-learning is one of the most successful algorithms for finding the best action-value function (and thus the optimal policy) in reinforcement learning, its implementation often suffers from large overestimation of Q-function values…

Machine Learning · Computer Science 2020-10-13 Huaqing Xiong , Lin Zhao , Yingbin Liang , Wei Zhang

Self-correcting Q-Learning

The Q-learning algorithm is known to be affected by the maximization bias, i.e. the systematic overestimation of action values, an important issue that has recently received renewed attention. Double Q-learning has been proposed as an…

Machine Learning · Computer Science 2021-02-03 Rong Zhu , Mattia Rigotti

Stabilizing Q-learning with Linear Architectures for Provably Efficient Learning

The $Q$-learning algorithm is a simple and widely-used stochastic approximation scheme for reinforcement learning, but the basic protocol can exhibit instability in conjunction with function approximation. Such instability can be observed…

Machine Learning · Computer Science 2022-06-03 Andrea Zanette , Martin J. Wainwright

Regularized Q-Learning with Linear Function Approximation

Regularized Markov Decision Processes serve as models of sequential decision making under uncertainty wherein the decision maker has limited information processing capacity and/or aversion to model ambiguity. With functional approximation,…

Artificial Intelligence · Computer Science 2025-02-11 Jiachen Xi , Alfredo Garcia , Petar Momcilovic

Regularized Q-learning through Robust Averaging

We propose a new Q-learning variant, called 2RA Q-learning, that addresses some weaknesses of existing Q-learning methods in a principled manner. One such weakness is an underlying estimation bias which cannot be controlled and often…

Optimization and Control · Mathematics 2024-05-30 Peter Schmitt-Förster , Tobias Sutter

A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation

Q-learning with neural network function approximation (neural Q-learning for short) is among the most prevalent deep reinforcement learning algorithms. Despite its empirical success, the non-asymptotic convergence rate of neural Q-learning…

Machine Learning · Computer Science 2020-03-05 Pan Xu , Quanquan Gu

A Discrete-Time Switching System Analysis of Q-learning

This paper develops a novel control-theoretic framework to analyze the non-asymptotic convergence of Q-learning. We show that the dynamics of asynchronous Q-learning with a constant step-size can be naturally formulated as a discrete-time…

Optimization and Control · Mathematics 2024-08-23 Donghwan Lee , Jianghai Hu , Niao He

Smooth Q-learning: Accelerate Convergence of Q-learning Using Similarity

An improvement of Q-learning is proposed in this paper. It is different from classic Q-learning in that the similarity between different states and actions is considered in the proposed method. During the training, a new updating mechanism…

Artificial Intelligence · Computer Science 2021-06-03 Wei Liao , Xiaohui Wei , Jizhou Lai

Convex Q Learning in a Stochastic Environment: Extended Version

The paper introduces the first formulation of convex Q-learning for Markov decision processes with function approximation. The algorithms and theory rest on a relaxation of a dual of Manne's celebrated linear programming characterization of…

Optimization and Control · Mathematics 2023-09-12 Fan Lu , Sean Meyn

Implicit Q-Learning and SARSA: Liberating Policy Control from Step-Size Calibration

Q-learning and SARSA are foundational reinforcement learning algorithms whose practical success depends critically on step-size calibration. Step-sizes that are too large can cause numerical instability, while step-sizes that are too small…

Machine Learning · Statistics 2026-01-28 Hwanwoo Kim , Eric Laber

Constant Stepsize Q-learning: Distributional Convergence, Bias and Extrapolation

Stochastic Approximation (SA) is a widely used algorithmic approach in various fields, including optimization and reinforcement learning (RL). Among RL algorithms, Q-learning is particularly popular due to its empirical success. In this…

Machine Learning · Statistics 2024-01-26 Yixuan Zhang , Qiaomin Xie

Balanced Q-learning: Combining the Influence of Optimistic and Pessimistic Targets

The optimistic nature of the Q-learning target leads to an overestimation bias, which is an inherent problem associated with standard $Q-$learning. Such a bias fails to account for the possibility of low returns, particularly in risky…

Machine Learning · Computer Science 2021-11-05 Thommen George Karimpanal , Hung Le , Majid Abdolshah , Santu Rana , Sunil Gupta , Truyen Tran , Svetha Venkatesh

Stochastic Approximation for Risk-aware Markov Decision Processes

We develop a stochastic approximation-type algorithm to solve finite state/action, infinite-horizon, risk-aware Markov decision processes. Our algorithm has two loops. The inner loop computes the risk by solving a stochastic saddle-point…

Optimization and Control · Mathematics 2019-12-05 Wenjie Huang , William B. Haskell

Empirical Q-Value Iteration

We propose a new simple and natural algorithm for learning the optimal Q-value function of a discounted-cost Markov Decision Process (MDP) when the transition kernels are unknown. Unlike the classical learning algorithms for MDPs, such as…

Optimization and Control · Mathematics 2019-01-31 Dileep Kalathil , Vivek S. Borkar , Rahul Jain

Stochastic Primal-Dual Q-Learning

In this work, we present a new model-free and off-policy reinforcement learning (RL) algorithm, that is capable of finding a near-optimal policy with state-action observations from arbitrary behavior policies. Our algorithm, called the…

Optimization and Control · Mathematics 2025-07-21 Narim Jeong , Donghwan Lee , Niao He

Self-Imitation Learning via Generalized Lower Bound Q-learning

Self-imitation learning motivated by lower-bound Q-learning is a novel and effective approach for off-policy learning. In this work, we propose a n-step lower bound which generalizes the original return-based lower-bound Q-learning, and…

Machine Learning · Computer Science 2021-02-16 Yunhao Tang

On the Estimation Bias in Double Q-Learning

Double Q-learning is a classical method for reducing overestimation bias, which is caused by taking maximum estimated values in the Bellman operation. Its variants in the deep Q-learning paradigm have shown great promise in producing…

Machine Learning · Computer Science 2022-01-17 Zhizhou Ren , Guangxiang Zhu , Hao Hu , Beining Han , Jianglun Chen , Chongjie Zhang

Learned Collusion

Q-learning can be described as an all-purpose automaton that provides estimates (Q-values) of the continuation values associated with each available action and follows the naive policy of almost always choosing the action with highest…

Theoretical Economics · Economics 2025-05-29 Olivier Compte