Related papers: Regularized Q-Learning with Linear Function Approx…

Successive Over Relaxation Q-Learning

In a discounted reward Markov Decision Process (MDP), the objective is to find the optimal value function, i.e., the value function corresponding to an optimal policy. This problem reduces to solving a functional equation known as the…

Machine Learning · Computer Science 2019-06-17 Chandramouli Kamanchi , Raghuram Bharadwaj Diddigi , Shalabh Bhatnagar

Stabilizing Q-learning with Linear Architectures for Provably Efficient Learning

The $Q$-learning algorithm is a simple and widely-used stochastic approximation scheme for reinforcement learning, but the basic protocol can exhibit instability in conjunction with function approximation. Such instability can be observed…

Machine Learning · Computer Science 2022-06-03 Andrea Zanette , Martin J. Wainwright

Reinforcement Learning with Function Approximation for Non-Markov Processes

We study reinforcement learning methods with linear function approximation under non-Markov state and cost processes. We first consider the policy evaluation method and show that the algorithm converges under suitable ergodicity conditions…

Machine Learning · Computer Science 2026-01-05 Ali Devran Kara

Periodic Regularized Q-Learning

In reinforcement learning (RL), Q-learning is a fundamental algorithm whose convergence is guaranteed in the tabular setting. However, this convergence guarantee does not hold under linear function approximation. To overcome this…

Machine Learning · Computer Science 2026-02-04 Hyukjun Yang , Han-Dong Lim , Donghwan Lee

Multi-Bellman operator for convergence of $Q$-learning with linear function approximation

We study the convergence of $Q$-learning with linear function approximation. Our key contribution is the introduction of a novel multi-Bellman operator that extends the traditional Bellman operator. By exploring the properties of this…

Machine Learning · Computer Science 2023-10-02 Diogo S. Carvalho , Pedro A. Santos , Francisco S. Melo

Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics

We study computationally and statistically efficient Reinforcement Learning algorithms for the linear Bellman Complete setting. This setting uses linear function approximation to capture value functions and unifies existing models like…

Machine Learning · Computer Science 2025-03-04 Runzhe Wu , Ayush Sekhari , Akshay Krishnamurthy , Wen Sun

Nonparametric Stochastic Compositional Gradient Descent for Q-Learning in Continuous Markov Decision Problems

We consider Markov Decision Problems defined over continuous state and action spaces, where an autonomous agent seeks to learn a map from its states to actions so as to maximize its long-term discounted accumulation of rewards. We address…

Machine Learning · Computer Science 2018-04-23 Alec Koppel , Ekaterina Tolstaya , Ethan Stump , Alejandro Ribeiro

Two-Step Q-Learning

Q-learning is a stochastic approximation version of the classic value iteration. The literature has established that Q-learning suffers from both maximization bias and slower convergence. Recently, multi-step algorithms have shown practical…

Machine Learning · Computer Science 2024-07-03 Antony Vijesh , Shreyas S R

Regularized Q-learning

Q-learning is widely used algorithm in reinforcement learning community. Under the lookup table setting, its convergence is well established. However, its behavior is known to be unstable with the linear function approximation case. This…

Machine Learning · Computer Science 2025-02-11 Han-Dong Lim , Donghwan Lee

Linear $Q$-Learning Does Not Diverge in $L^2$: Convergence Rates to a Bounded Set

$Q$-learning is one of the most fundamental reinforcement learning algorithms. It is widely believed that $Q$-learning with linear function approximation (i.e., linear $Q$-learning) suffers from possible divergence until the recent work…

Machine Learning · Computer Science 2025-05-28 Xinyu Liu , Zixuan Xie , Shangtong Zhang

Contraction-Aligned Analysis of Soft Bellman Residual Minimization with Weighted Lp-Norm for Markov Decision Problem

The problem of solving Markov decision processes under function approximation remains a fundamental challenge, even under linear function approximation settings. A key difficulty arises from a geometric mismatch: while the Bellman…

Machine Learning · Computer Science 2026-04-09 Hyukjun Yang , Han-Dong Lim , Donghwan Lee

Deep neural networks can provably solve Bellman equations for Markov decision processes without the curse of dimensionality

Discrete time stochastic optimal control problems and Markov decision processes (MDPs) are fundamental models for sequential decision-making under uncertainty and as such provide the mathematical framework underlying reinforcement learning…

Optimization and Control · Mathematics 2025-07-01 Arnulf Jentzen , Konrad Kleinberg , Thomas Kruse

Q-Measure-Learning for Continuous State RL: Efficient Implementation and Convergence

We study reinforcement learning in infinite-horizon discounted Markov decision processes with continuous state spaces, where data are generated online from a single trajectory under a Markovian behavior policy. To avoid maintaining an…

Machine Learning · Computer Science 2026-03-05 Shengbo Wang

Convex Q Learning in a Stochastic Environment: Extended Version

The paper introduces the first formulation of convex Q-learning for Markov decision processes with function approximation. The algorithms and theory rest on a relaxation of a dual of Manne's celebrated linear programming characterization of…

Optimization and Control · Mathematics 2023-09-12 Fan Lu , Sean Meyn

Sample-Optimal Parametric Q-Learning Using Linearly Additive Features

Consider a Markov decision process (MDP) that admits a set of state-action features, which can linearly express the process's probabilistic transition model. We propose a parametric Q-learning algorithm that finds an approximate-optimal…

Machine Learning · Computer Science 2019-06-07 Lin F. Yang , Mengdi Wang

A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation

Q-learning with neural network function approximation (neural Q-learning for short) is among the most prevalent deep reinforcement learning algorithms. Despite its empirical success, the non-asymptotic convergence rate of neural Q-learning…

Machine Learning · Computer Science 2020-03-05 Pan Xu , Quanquan Gu

Data-Efficient Quadratic Q-Learning Using LMIs

Reinforcement learning (RL) has seen significant research and application results but often requires large amounts of training data. This paper proposes two data-efficient off-policy RL methods that use parametrized Q-learning. In these…

Systems and Control · Electrical Eng. & Systems 2025-04-09 J. S. van Hulst , W. P. M. H. Heemels , D. J. Antunes

Stabilizing Q Learning Via Soft Mellowmax Operator

Learning complicated value functions in high dimensional state space by function approximation is a challenging task, partially due to that the max-operator used in temporal difference updates can theoretically cause instability for most…

Machine Learning · Computer Science 2020-12-21 Yaozhong Gan , Zhe Zhang , Xiaoyang Tan

Approximate Kalman Filter Q-Learning for Continuous State-Space MDPs

We seek to learn an effective policy for a Markov Decision Process (MDP) with continuous states via Q-Learning. Given a set of basis functions over state action pairs we search for a corresponding set of linear weights that minimizes the…

Machine Learning · Computer Science 2013-09-27 Charles Tripp , Ross D. Shachter

Approximate Q-Learning for Controlled Diffusion Processes and its Near Optimality

We study a Q learning algorithm for continuous time stochastic control problems. The proposed algorithm uses the sampled state process by discretizing the state and control action spaces under piece-wise constant control processes. We show…

Optimization and Control · Mathematics 2023-03-10 Erhan Bayraktar , Ali Devran Kara