English
Related papers

Related papers: Policy Evaluation in Continuous MDPs with Efficien…

200 papers

We consider Markov Decision Problems defined over continuous state and action spaces, where an autonomous agent seeks to learn a map from its states to actions so as to maximize its long-term discounted accumulation of rewards. We address…

Machine Learning · Computer Science 2018-04-23 Alec Koppel , Ekaterina Tolstaya , Ethan Stump , Alejandro Ribeiro

We study methods based on reproducing kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process (MRP). We study a regularized form of the kernel least-squares temporal difference (LSTD)…

Machine Learning · Statistics 2021-09-27 Yaqi Duan , Mengdi Wang , Martin J. Wainwright

We propose a principled kernel-based policy iteration algorithm to solve the continuous-state Markov Decision Processes (MDPs). In contrast to most decision-theoretic planning frameworks, which assume fully known state transition models, we…

Robotics · Computer Science 2020-06-04 Junhong Xu , Kai Yin , Lantao Liu

Discrete time stochastic optimal control problems and Markov decision processes (MDPs), respectively, serve as fundamental models for problems that involve sequential decision making under uncertainty and as such constitute the theoretical…

Optimization and Control · Mathematics 2023-03-08 Christian Beck , Arnulf Jentzen , Konrad Kleinberg , Thomas Kruse

We propose a new, nonparametric approach to learning and representing transition dynamics in Markov decision processes (MDPs), which can be combined easily with dynamic programming methods for policy optimisation and value estimation. This…

Machine Learning · Computer Science 2012-06-22 Steffen Grunewalder , Guy Lever , Luca Baldassarre , Massi Pontil , Arthur Gretton

We study the problem of learning optimal policies in finite-horizon Markov Decision Processes (MDPs) using low-rank reinforcement learning (RL) methods. In finite-horizon MDPs, the policies, and therefore the value functions (VFs) are not…

Machine Learning · Computer Science 2026-05-14 Sergio Rozada , Jose Luis Orejuela , Antonio G. Marques

Markov Decision Processes (MDPs) are a formal framework for modeling and solving sequential decision-making problems. In finite-time horizons such problems are relevant for instance for optimal stopping or specific supply chain problems,…

Optimization and Control · Mathematics 2024-05-07 Sara Klein , Simon Weissmann , Leif Döring

We propose a diffusion approximation method to the continuous-state Markov Decision Processes (MDPs) that can be utilized to address autonomous navigation and control in unstructured off-road environments. In contrast to most…

Robotics · Computer Science 2024-02-08 Junhong Xu , Kai Yin , Zheng Chen , Jason M. Gregory , Ethan A. Stump , Lantao Liu

Robust Markov decision processes (MDPs) allow to compute reliable solutions for dynamic decision problems whose evolution is modeled by rewards and partially-known transition probabilities. Unfortunately, accounting for uncertainty in the…

Machine Learning · Computer Science 2020-06-18 Chin Pang Ho , Marek Petrik , Wolfram Wiesemann

Markov decision processes (MDPs) is viewed as an optimization of an objective function over certain linear operators over general function spaces. A new existence result is established for the existence of optimal policies in general MDPs,…

Machine Learning · Computer Science 2026-04-01 Abhishek Gupta , Aditya Mahajan

Motivated by many application problems, we consider Markov decision processes (MDPs) with a general loss function and unknown parameters. To mitigate the epistemic uncertainty associated with unknown parameters, we take a Bayesian approach…

Machine Learning · Computer Science 2025-10-02 Xiaoshuang Wang , Yifan Lin , Enlu Zhou

We propose a new, nonparametric approach to estimating the value function in reinforcement learning. This approach makes use of a recently developed representation of conditional distributions as functions in a reproducing kernel Hilbert…

Machine Learning · Computer Science 2012-10-19 Steffen Grünewälder , Luca Baldassarre , Massimiliano Pontil , Arthur Gretton , Guy Lever

Reinforcement learning considers the problem of finding policies that maximize an expected cumulative reward in a Markov decision process with unknown transition probabilities. In this paper we consider the problem of finding optimal…

Machine Learning · Computer Science 2020-10-19 Santiago Paternain , Juan Andres Bazerque , Alejandro Ribeiro

We revisit the finite time analysis of policy gradient methods in the one of the simplest settings: finite state and action MDPs with a policy class consisting of all stochastic policies and with exact gradient evaluations. There has been…

Machine Learning · Computer Science 2021-12-14 Jalaj Bhandari , Daniel Russo

Reinforcement learning consists of finding policies that maximize an expected cumulative long-term reward in a Markov decision process with unknown transition probabilities and instantaneous rewards. In this paper, we consider the problem…

Systems and Control · Computer Science 2018-07-31 Santiago Paternain , Juan Andrés Bazerque , Austin Small , Alejandro Ribeiro

We study reinforcement learning (RL) with linear function approximation in Markov Decision Processes (MDPs) satisfying \emph{linear Bellman completeness} -- a fundamental setting where the Bellman backup of any linear value function remains…

Machine Learning · Computer Science 2026-03-25 Zakaria Mhammedi , Alexander Rakhlin , Nneka Okolo

We present the first finite time global convergence analysis of policy gradient in the context of infinite horizon average reward Markov decision processes (MDPs). Specifically, we focus on ergodic tabular MDPs with finite state and action…

Machine Learning · Computer Science 2024-03-12 Navdeep Kumar , Yashaswini Murthy , Itai Shufaro , Kfir Y. Levy , R. Srikant , Shie Mannor

A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of a cost signal constrained to lie below…

Machine Learning · Computer Science 2019-05-29 Nicolas Carrara , Edouard Leurent , Romain Laroche , Tanguy Urvoy , Odalric-Ambrym Maillard , Olivier Pietquin

We present the first finite-sample analysis of policy evaluation in robust average-reward Markov Decision Processes (MDPs). Prior work in this setting have established only asymptotic convergence guarantees, leaving open the question of…

Machine Learning · Statistics 2025-12-11 Yang Xu , Washim Uddin Mondal , Vaneet Aggarwal

Designing efficient learning algorithms with complexity guarantees for Markov decision processes (MDPs) with large or continuous state and action spaces remains a fundamental challenge. We address this challenge for entropy-regularized MDPs…

Machine Learning · Computer Science 2025-06-05 Matthieu Meunier , Christoph Reisinger , Yufei Zhang
‹ Prev 1 2 3 10 Next ›