Related papers: Reinforcement Learning with Function Approximation…

Learning POMDPs with Linear Function Approximation and Finite Memory

We study reinforcement learning with linear function approximation and finite-memory approximations for partially observed Markov decision processes (POMDPs). We first present an algorithm for the value evaluation of finite-memory feedback…

Optimization and Control · Mathematics 2025-05-22 Ali Devran Kara

A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation

Q-learning with neural network function approximation (neural Q-learning for short) is among the most prevalent deep reinforcement learning algorithms. Despite its empirical success, the non-asymptotic convergence rate of neural Q-learning…

Machine Learning · Computer Science 2020-03-05 Pan Xu , Quanquan Gu

Regularized Q-Learning with Linear Function Approximation

Regularized Markov Decision Processes serve as models of sequential decision making under uncertainty wherein the decision maker has limited information processing capacity and/or aversion to model ambiguity. With functional approximation,…

Artificial Intelligence · Computer Science 2025-02-11 Jiachen Xi , Alfredo Garcia , Petar Momcilovic

Approximating Euclidean by Imprecise Markov Decision Processes

Euclidean Markov decision processes are a powerful tool for modeling control problems under uncertainty over continuous domains. Finite state imprecise, Markov decision processes can be used to approximate the behavior of these infinite…

Artificial Intelligence · Computer Science 2020-06-29 Manfred Jaeger , Giorgio Bacci , Giovanni Bacci , Kim Guldstrand Larsen , Peter Gjøl Jensen

Reinforcement Learning under Model Mismatch

We study reinforcement learning under model misspecification, where we do not have access to the true environment but only to a reasonably close approximation to it. We address this problem by extending the framework of robust MDPs to the…

Machine Learning · Computer Science 2017-11-10 Aurko Roy , Huan Xu , Sebastian Pokutta

Q-Measure-Learning for Continuous State RL: Efficient Implementation and Convergence

We study reinforcement learning in infinite-horizon discounted Markov decision processes with continuous state spaces, where data are generated online from a single trajectory under a Markovian behavior policy. To avoid maintaining an…

Machine Learning · Computer Science 2026-03-05 Shengbo Wang

Reinforcement Learning: Stochastic Approximation Algorithms for Markov Decision Processes

This article presents a short and concise description of stochastic approximation algorithms in reinforcement learning of Markov decision processes. The algorithms can also be used as a suboptimal method for partially observed Markov…

Optimization and Control · Mathematics 2015-12-25 Vikram Krishnamurthy

Commit to the Bit: Reactive Reinforcement Learning Done Right

Reinforcement learning algorithms are commonly analyzed (and designed) under the Markov assumption. This is unrealistic, as most environments encountered in practice are either partially observable, or require function approximation that…

Machine Learning · Computer Science 2026-05-28 Onno Eberhard , Claire Vernade , Michael Muehlebach

Adaptive Resolving Methods for Reinforcement Learning with Function Approximations

Reinforcement learning (RL) problems are fundamental in online decision-making and have been instrumental in finding an optimal policy for Markov decision processes (MDPs). Function approximations are usually deployed to handle large or…

Machine Learning · Computer Science 2025-05-20 Jiashuo Jiang , Yiming Zong , Yinyu Ye

Reinforcement Learning for Exponential Utility: Algorithms and Convergence in Discounted MDPs

Reinforcement learning (RL) for exponential-utility optimization in discounted Markov decision processes (MDPs) lacks principled value-based algorithms. We address this gap in the fixed risk-aversion setting. Building on the Bellman-type…

Machine Learning · Computer Science 2026-05-11 Gugan Thoppe , L. A. Prashanth , Ankur Naskar , Sanjay Bhat

Reinforcement Learning for Joint Optimization of Multiple Rewards

Finding optimal policies which maximize long term rewards of Markov Decision Processes requires the use of dynamic programming and backward induction to solve the Bellman optimality equation. However, many real-world problems require…

Machine Learning · Computer Science 2023-01-10 Mridul Agarwal , Vaneet Aggarwal

Reinforcement Learning in Non-Markovian Environments

Motivated by the novel paradigm developed by Van Roy and coauthors for reinforcement learning in arbitrary non-Markovian environments, we propose a related formulation and explicitly pin down the error caused by non-Markovianity of…

Systems and Control · Electrical Eng. & Systems 2024-02-15 Siddharth Chandak , Pratik Shah , Vivek S Borkar , Parth Dodhia

Regularized Q-learning

Q-learning is widely used algorithm in reinforcement learning community. Under the lookup table setting, its convergence is well established. However, its behavior is known to be unstable with the linear function approximation case. This…

Machine Learning · Computer Science 2025-02-11 Han-Dong Lim , Donghwan Lee

Safe Reinforcement Learning for Constrained Markov Decision Processes with Stochastic Stopping Time

In this paper, we present an online reinforcement learning algorithm for constrained Markov decision processes with a safety constraint. Despite the necessary attention of the scientific community, considering stochastic stopping time, the…

Machine Learning · Computer Science 2024-03-26 Abhijit Mazumdar , Rafal Wisniewski , Manuela L. Bujorianu

Reinforcement Learning with Function Approximation: From Linear to Nonlinear

Function approximation has been an indispensable component in modern reinforcement learning algorithms designed to tackle problems with large state spaces in high dimensions. This paper reviews recent results on error analysis for these…

Machine Learning · Computer Science 2024-02-27 Jihao Long , Jiequn Han

Finite-Sample Analysis of Nonlinear Stochastic Approximation with Applications in Reinforcement Learning

Motivated by applications in reinforcement learning (RL), we study a nonlinear stochastic approximation (SA) algorithm under Markovian noise, and establish its finite-sample convergence bounds under various stepsizes. Specifically, we show…

Optimization and Control · Mathematics 2022-01-27 Zaiwei Chen , Sheng Zhang , Thinh T. Doan , John-Paul Clarke , Siva Theja Maguluri

Linear-Quadratic Mean-Field Reinforcement Learning: Convergence of Policy Gradient Methods

We investigate reinforcement learning in the setting of Markov decision processes for a large number of exchangeable agents interacting in a mean field manner. Applications include, for example, the control of a large number of robots…

Optimization and Control · Mathematics 2025-04-30 René Carmona , Mathieu Laurière , Zongjun Tan

Stochastic Approximation with Markov Noise: Analysis and applications in reinforcement learning

We present for the first time an asymptotic convergence analysis of two time-scale stochastic approximation driven by "controlled" Markov noise. In particular, the faster and slower recursions have non-additive controlled Markov noise…

Machine Learning · Computer Science 2020-12-03 Prasenjit Karmakar

Reinforcement Learning with Unbiased Policy Evaluation and Linear Function Approximation

We provide performance guarantees for a variant of simulation-based policy iteration for controlling Markov decision processes that involves the use of stochastic approximation algorithms along with state-of-the-art techniques that are…

Machine Learning · Computer Science 2022-10-17 Anna Winnicki , R. Srikant

Reinforcement Learning of Markov Decision Processes with Peak Constraints

In this paper, we consider reinforcement learning of Markov Decision Processes (MDP) with peak constraints, where an agent chooses a policy to optimize an objective and at the same time satisfy additional constraints. The agent has to take…

Optimization and Control · Mathematics 2019-12-09 Ather Gattami