English
Related papers

Related papers: Regularized Q-Learning with Linear Function Approx…

200 papers

In a discounted reward Markov Decision Process (MDP), the objective is to find the optimal value function, i.e., the value function corresponding to an optimal policy. This problem reduces to solving a functional equation known as the…

Machine Learning · Computer Science 2019-06-17 Chandramouli Kamanchi , Raghuram Bharadwaj Diddigi , Shalabh Bhatnagar

The $Q$-learning algorithm is a simple and widely-used stochastic approximation scheme for reinforcement learning, but the basic protocol can exhibit instability in conjunction with function approximation. Such instability can be observed…

Machine Learning · Computer Science 2022-06-03 Andrea Zanette , Martin J. Wainwright

We study reinforcement learning methods with linear function approximation under non-Markov state and cost processes. We first consider the policy evaluation method and show that the algorithm converges under suitable ergodicity conditions…

Machine Learning · Computer Science 2026-01-05 Ali Devran Kara

In reinforcement learning (RL), Q-learning is a fundamental algorithm whose convergence is guaranteed in the tabular setting. However, this convergence guarantee does not hold under linear function approximation. To overcome this…

Machine Learning · Computer Science 2026-02-04 Hyukjun Yang , Han-Dong Lim , Donghwan Lee

We study the convergence of $Q$-learning with linear function approximation. Our key contribution is the introduction of a novel multi-Bellman operator that extends the traditional Bellman operator. By exploring the properties of this…

Machine Learning · Computer Science 2023-10-02 Diogo S. Carvalho , Pedro A. Santos , Francisco S. Melo

We study computationally and statistically efficient Reinforcement Learning algorithms for the linear Bellman Complete setting. This setting uses linear function approximation to capture value functions and unifies existing models like…

Machine Learning · Computer Science 2025-03-04 Runzhe Wu , Ayush Sekhari , Akshay Krishnamurthy , Wen Sun

We consider Markov Decision Problems defined over continuous state and action spaces, where an autonomous agent seeks to learn a map from its states to actions so as to maximize its long-term discounted accumulation of rewards. We address…

Machine Learning · Computer Science 2018-04-23 Alec Koppel , Ekaterina Tolstaya , Ethan Stump , Alejandro Ribeiro

Q-learning is a stochastic approximation version of the classic value iteration. The literature has established that Q-learning suffers from both maximization bias and slower convergence. Recently, multi-step algorithms have shown practical…

Machine Learning · Computer Science 2024-07-03 Antony Vijesh , Shreyas S R

Q-learning is widely used algorithm in reinforcement learning community. Under the lookup table setting, its convergence is well established. However, its behavior is known to be unstable with the linear function approximation case. This…

Machine Learning · Computer Science 2025-02-11 Han-Dong Lim , Donghwan Lee

$Q$-learning is one of the most fundamental reinforcement learning algorithms. It is widely believed that $Q$-learning with linear function approximation (i.e., linear $Q$-learning) suffers from possible divergence until the recent work…

Machine Learning · Computer Science 2025-05-28 Xinyu Liu , Zixuan Xie , Shangtong Zhang

The problem of solving Markov decision processes under function approximation remains a fundamental challenge, even under linear function approximation settings. A key difficulty arises from a geometric mismatch: while the Bellman…

Machine Learning · Computer Science 2026-04-09 Hyukjun Yang , Han-Dong Lim , Donghwan Lee

Discrete time stochastic optimal control problems and Markov decision processes (MDPs) are fundamental models for sequential decision-making under uncertainty and as such provide the mathematical framework underlying reinforcement learning…

Optimization and Control · Mathematics 2025-07-01 Arnulf Jentzen , Konrad Kleinberg , Thomas Kruse

We study reinforcement learning in infinite-horizon discounted Markov decision processes with continuous state spaces, where data are generated online from a single trajectory under a Markovian behavior policy. To avoid maintaining an…

Machine Learning · Computer Science 2026-03-05 Shengbo Wang

The paper introduces the first formulation of convex Q-learning for Markov decision processes with function approximation. The algorithms and theory rest on a relaxation of a dual of Manne's celebrated linear programming characterization of…

Optimization and Control · Mathematics 2023-09-12 Fan Lu , Sean Meyn

Consider a Markov decision process (MDP) that admits a set of state-action features, which can linearly express the process's probabilistic transition model. We propose a parametric Q-learning algorithm that finds an approximate-optimal…

Machine Learning · Computer Science 2019-06-07 Lin F. Yang , Mengdi Wang

Q-learning with neural network function approximation (neural Q-learning for short) is among the most prevalent deep reinforcement learning algorithms. Despite its empirical success, the non-asymptotic convergence rate of neural Q-learning…

Machine Learning · Computer Science 2020-03-05 Pan Xu , Quanquan Gu

Reinforcement learning (RL) has seen significant research and application results but often requires large amounts of training data. This paper proposes two data-efficient off-policy RL methods that use parametrized Q-learning. In these…

Systems and Control · Electrical Eng. & Systems 2025-04-09 J. S. van Hulst , W. P. M. H. Heemels , D. J. Antunes

Learning complicated value functions in high dimensional state space by function approximation is a challenging task, partially due to that the max-operator used in temporal difference updates can theoretically cause instability for most…

Machine Learning · Computer Science 2020-12-21 Yaozhong Gan , Zhe Zhang , Xiaoyang Tan

We seek to learn an effective policy for a Markov Decision Process (MDP) with continuous states via Q-Learning. Given a set of basis functions over state action pairs we search for a corresponding set of linear weights that minimizes the…

Machine Learning · Computer Science 2013-09-27 Charles Tripp , Ross D. Shachter

We study a Q learning algorithm for continuous time stochastic control problems. The proposed algorithm uses the sampled state process by discretizing the state and control action spaces under piece-wise constant control processes. We show…

Optimization and Control · Mathematics 2023-03-10 Erhan Bayraktar , Ali Devran Kara
‹ Prev 1 2 3 10 Next ›