Related papers: Policy Evaluation in Continuous MDPs with Efficien…

Nonparametric Stochastic Compositional Gradient Descent for Q-Learning in Continuous Markov Decision Problems

We consider Markov Decision Problems defined over continuous state and action spaces, where an autonomous agent seeks to learn a map from its states to actions so as to maximize its long-term discounted accumulation of rewards. We address…

Machine Learning · Computer Science 2018-04-23 Alec Koppel , Ekaterina Tolstaya , Ethan Stump , Alejandro Ribeiro

Optimal policy evaluation using kernel-based temporal difference methods

We study methods based on reproducing kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process (MRP). We study a regularized form of the kernel least-squares temporal difference (LSTD)…

Machine Learning · Statistics 2021-09-27 Yaqi Duan , Mengdi Wang , Martin J. Wainwright

Kernel Taylor-Based Value Function Approximation for Continuous-State Markov Decision Processes

We propose a principled kernel-based policy iteration algorithm to solve the continuous-state Markov Decision Processes (MDPs). In contrast to most decision-theoretic planning frameworks, which assume fully known state transition models, we…

Robotics · Computer Science 2020-06-04 Junhong Xu , Kai Yin , Lantao Liu

Nonlinear Monte Carlo methods with polynomial runtime for Bellman equations of discrete time high-dimensional stochastic optimal control problems

Discrete time stochastic optimal control problems and Markov decision processes (MDPs), respectively, serve as fundamental models for problems that involve sequential decision making under uncertainty and as such constitute the theoretical…

Optimization and Control · Mathematics 2023-03-08 Christian Beck , Arnulf Jentzen , Konrad Kleinberg , Thomas Kruse

Modelling transition dynamics in MDPs with RKHS embeddings

We propose a new, nonparametric approach to learning and representing transition dynamics in Markov decision processes (MDPs), which can be combined easily with dynamic programming methods for policy optimisation and value estimation. This…

Machine Learning · Computer Science 2012-06-22 Steffen Grunewalder , Guy Lever , Luca Baldassarre , Massi Pontil , Arthur Gretton

Addressing Finite-Horizon MDPs via Low-Rank Tensor Value Approximation

We study the problem of learning optimal policies in finite-horizon Markov Decision Processes (MDPs) using low-rank reinforcement learning (RL) methods. In finite-horizon MDPs, the policies, and therefore the value functions (VFs) are not…

Machine Learning · Computer Science 2026-05-14 Sergio Rozada , Jose Luis Orejuela , Antonio G. Marques

Beyond Stationarity: Convergence Analysis of Stochastic Softmax Policy Gradient Methods

Markov Decision Processes (MDPs) are a formal framework for modeling and solving sequential decision-making problems. In finite-time horizons such problems are relevant for instance for optimal stopping or specific supply chain problems,…

Optimization and Control · Mathematics 2024-05-07 Sara Klein , Simon Weissmann , Leif Döring

Kernel-based diffusion approximated Markov decision processes for autonomous navigation and control on unstructured terrains

We propose a diffusion approximation method to the continuous-state Markov Decision Processes (MDPs) that can be utilized to address autonomous navigation and control in unstructured off-road environments. In contrast to most…

Robotics · Computer Science 2024-02-08 Junhong Xu , Kai Yin , Zheng Chen , Jason M. Gregory , Ethan A. Stump , Lantao Liu

Partial Policy Iteration for L1-Robust Markov Decision Processes

Robust Markov decision processes (MDPs) allow to compute reliable solutions for dynamic decision problems whose evolution is modeled by rewards and partially-known transition probabilities. Unfortunately, accounting for uncertainty in the…

Machine Learning · Computer Science 2020-06-18 Chin Pang Ho , Marek Petrik , Wolfram Wiesemann

Operator-Theoretic Foundations and Policy Gradient Methods for General MDPs with Unbounded Costs

Markov decision processes (MDPs) is viewed as an optimization of an objective function over certain linear operators over general function spaces. A new existence result is established for the existence of optimal policies in general MDPs,…

Machine Learning · Computer Science 2026-04-01 Abhishek Gupta , Aditya Mahajan

Bayesian Risk-Sensitive Policy Optimization For MDPs With General Loss Functions

Motivated by many application problems, we consider Markov decision processes (MDPs) with a general loss function and unknown parameters. To mitigate the epistemic uncertainty associated with unknown parameters, we take a Bayesian approach…

Machine Learning · Computer Science 2025-10-02 Xiaoshuang Wang , Yifan Lin , Enlu Zhou

Modeling transition dynamics in MDPs with RKHS embeddings of conditional distributions

We propose a new, nonparametric approach to estimating the value function in reinforcement learning. This approach makes use of a recently developed representation of conditional distributions as functions in a reproducing kernel Hilbert…

Machine Learning · Computer Science 2012-10-19 Steffen Grünewälder , Luca Baldassarre , Massimiliano Pontil , Arthur Gretton , Guy Lever

Policy Gradient for Continuing Tasks in Non-stationary Markov Decision Processes

Reinforcement learning considers the problem of finding policies that maximize an expected cumulative reward in a Markov decision process with unknown transition probabilities. In this paper we consider the problem of finding optimal…

Machine Learning · Computer Science 2020-10-19 Santiago Paternain , Juan Andres Bazerque , Alejandro Ribeiro

On Linear Convergence of Policy Gradient Methods for Finite MDPs

We revisit the finite time analysis of policy gradient methods in the one of the simplest settings: finite state and action MDPs with a policy class consisting of all stochastic policies and with exact gradient evaluations. There has been…

Machine Learning · Computer Science 2021-12-14 Jalaj Bhandari , Daniel Russo

Stochastic Policy Gradient Ascent in Reproducing Kernel Hilbert Spaces

Reinforcement learning consists of finding policies that maximize an expected cumulative long-term reward in a Markov decision process with unknown transition probabilities and instantaneous rewards. In this paper, we consider the problem…

Systems and Control · Computer Science 2018-07-31 Santiago Paternain , Juan Andrés Bazerque , Austin Small , Alejandro Ribeiro

End-to-End Efficient RL for Linear Bellman Complete MDPs with Deterministic Transitions

We study reinforcement learning (RL) with linear function approximation in Markov Decision Processes (MDPs) satisfying \emph{linear Bellman completeness} -- a fundamental setting where the Bellman backup of any linear value function remains…

Machine Learning · Computer Science 2026-03-25 Zakaria Mhammedi , Alexander Rakhlin , Nneka Okolo

On the Global Convergence of Policy Gradient in Average Reward Markov Decision Processes

We present the first finite time global convergence analysis of policy gradient in the context of infinite horizon average reward Markov decision processes (MDPs). Specifically, we focus on ergodic tabular MDPs with finite state and action…

Machine Learning · Computer Science 2024-03-12 Navdeep Kumar , Yashaswini Murthy , Itai Shufaro , Kfir Y. Levy , R. Srikant , Shie Mannor

Budgeted Reinforcement Learning in Continuous State Space

A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of a cost signal constrained to lie below…

Machine Learning · Computer Science 2019-05-29 Nicolas Carrara , Edouard Leurent , Romain Laroche , Tanguy Urvoy , Odalric-Ambrym Maillard , Olivier Pietquin

Finite-Sample Analysis of Policy Evaluation for Robust Average Reward Reinforcement Learning

We present the first finite-sample analysis of policy evaluation in robust average-reward Markov Decision Processes (MDPs). Prior work in this setting have established only asymptotic convergence guarantees, leaving open the question of…

Machine Learning · Statistics 2025-12-11 Yang Xu , Washim Uddin Mondal , Vaneet Aggarwal

Efficient Learning for Entropy-Regularized Markov Decision Processes via Multilevel Monte Carlo

Designing efficient learning algorithms with complexity guarantees for Markov decision processes (MDPs) with large or continuous state and action spaces remains a fundamental challenge. We address this challenge for entropy-regularized MDPs…

Machine Learning · Computer Science 2025-06-05 Matthieu Meunier , Christoph Reisinger , Yufei Zhang