Related papers: A Single-Loop Robust Policy Gradient Method for Ro…

Policy Gradient in Robust MDPs with Global Convergence Guarantee

Robust Markov decision processes (RMDPs) provide a promising framework for computing reliable policies in the face of model errors. Many successful reinforcement learning algorithms build on variations of policy-gradient methods, but…

Machine Learning · Computer Science 2024-05-15 Qiuhao Wang , Chin Pang Ho , Marek Petrik

Policy Gradient for Robust Markov Decision Processes

We develop a generic policy gradient method with the global optimality guarantee for robust Markov Decision Processes (MDPs). While policy gradient methods are widely used for solving dynamic decision problems due to their scalable and…

Machine Learning · Computer Science 2024-11-01 Qiuhao Wang , Shaohang Xu , Chin Pang Ho , Marek Petrik

Stochastic Variance-Reduced Policy Gradient

In this paper, we propose a novel reinforcement- learning algorithm consisting in a stochastic variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs). Stochastic variance-reduced gradient (SVRG) methods…

Machine Learning · Computer Science 2018-06-15 Matteo Papini , Damiano Binaghi , Giuseppe Canonaco , Matteo Pirotta , Marcello Restelli

Policy Gradient for Rectangular Robust Markov Decision Processes

Policy gradient methods have become a standard for training reinforcement learning agents in a scalable and efficient manner. However, they do not account for transition uncertainty, whereas learning robust policies can be computationally…

Machine Learning · Computer Science 2023-12-12 Navdeep Kumar , Esther Derman , Matthieu Geist , Kfir Levy , Shie Mannor

First-order Policy Optimization for Robust Markov Decision Process

We consider the problem of solving robust Markov decision process (MDP), which involves a set of discounted, finite state, finite action space MDPs with uncertain transition kernels. The goal of planning is to find a robust policy that…

Machine Learning · Computer Science 2023-06-13 Yan Li , Guanghui Lan , Tuo Zhao

On the convex formulations of robust Markov decision processes

Robust Markov decision processes (MDPs) are used for applications of dynamic optimization in uncertain environments and have been studied extensively. Many of the main properties and algorithms of MDPs, such as value iteration and policy…

Optimization and Control · Mathematics 2023-12-14 Julien Grand-Clément , Marek Petrik

Policy Gradient Algorithms for Robust MDPs with Non-Rectangular Uncertainty Sets

We propose policy gradient algorithms for robust infinite-horizon Markov decision processes (MDPs) with non-rectangular uncertainty sets, thereby addressing an open challenge in the robust MDP literature. Indeed, uncertainty sets that…

Optimization and Control · Mathematics 2025-09-30 Mengmeng Li , Daniel Kuhn , Tobias Sutter

Solving Robust Markov Decision Processes: Generic, Reliable, Efficient

Markov decision processes (MDP) are a well-established model for sequential decision-making in the presence of probabilities. In robust MDP (RMDP), every action is associated with an uncertainty set of probability distributions, modelling…

Artificial Intelligence · Computer Science 2024-12-16 Tobias Meggendorfer , Maximilian Weininger , Patrick Wienhöft

Sample Complexity of Robust Reinforcement Learning with a Generative Model

The Robust Markov Decision Process (RMDP) framework focuses on designing control policies that are robust against the parameter uncertainties due to the mismatches between the simulator model and real-world settings. An RMDP problem is…

Machine Learning · Computer Science 2022-05-17 Kishan Panaganti , Dileep Kalathil

Robust Constrained-MDPs: Soft-Constrained Robust Policy Optimization under Model Uncertainty

In this paper, we focus on the problem of robustifying reinforcement learning (RL) algorithms with respect to model uncertainties. Indeed, in the framework of model-based RL, we propose to merge the theory of constrained Markov decision…

Machine Learning · Computer Science 2020-10-13 Reazul Hasan Russel , Mouhacine Benosman , Jeroen Van Baar

Last-Iterate Convergent Policy Gradient Primal-Dual Methods for Constrained MDPs

We study the problem of computing an optimal policy of an infinite-horizon discounted constrained Markov decision process (constrained MDP). Despite the popularity of Lagrangian-based policy search methods used in practice, the oscillation…

Optimization and Control · Mathematics 2024-01-18 Dongsheng Ding , Chen-Yu Wei , Kaiqing Zhang , Alejandro Ribeiro

Robust Lagrangian and Adversarial Policy Gradient for Robust Constrained Markov Decision Processes

The robust constrained Markov decision process (RCMDP) is a recent task-modelling framework for reinforcement learning that incorporates behavioural constraints and that provides robustness to errors in the transition dynamics model through…

Machine Learning · Computer Science 2024-05-16 David M. Bossens

Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form

Designing a safe policy for uncertain environments is crucial in real-world control systems. However, this challenge remains inadequately addressed within the Markov decision process (MDP) framework. This paper presents the first algorithm…

Machine Learning · Computer Science 2026-04-27 Toshinori Kitamura , Tadashi Kozuno , Wataru Kumagai , Kenta Hoshino , Yohei Hosoe , Kazumi Kasaura , Masashi Hamaya , Paavo Parmas , Yutaka Matsuo

Robust Markov Decision Processes on Continuous State Spaces

We study infinite-horizon robust Markov decision processes (MDPs) on continuous state spaces with structured rectangular ambiguity set. The proposed ambiguity set falls within the convex hull of unknown generating kernels. We utilize the…

Optimization and Control · Mathematics 2026-05-28 Mengmeng Li , Yifan Hu , Daniel Kuhn , Yan Li

On the Complexity of Robust Markov Decision Processes and Bisimulation Metrics

Robust Markov decision processes (RMDPs) extend standard Markov decision processes (MDPs) to account for uncertainty in the transition probabilities. RMDPs have an uncertainty set that defines a set of possible transition functions, each of…

Logic in Computer Science · Computer Science 2026-04-30 Marnix Suilen , Guillermo A. Pérez

Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization

Natural policy gradient (NPG) methods are among the most widely used policy optimization algorithms in contemporary reinforcement learning. This class of methods is often applied in conjunction with entropy regularization -- an algorithmic…

Machine Learning · Statistics 2022-09-13 Shicong Cen , Chen Cheng , Yuxin Chen , Yuting Wei , Yuejie Chi

Best-Effort Policies for Robust Markov Decision Processes

We study the common generalization of Markov decision processes (MDPs) with sets of transition probabilities, known as robust MDPs (RMDPs). A standard goal in RMDPs is to compute a policy that maximizes the expected return under an…

Artificial Intelligence · Computer Science 2025-11-20 Alessandro Abate , Thom Badings , Giuseppe De Giacomo , Francesco Fabiano

Solving Robust MDPs through No-Regret Dynamics

Reinforcement Learning is a powerful framework for training agents to navigate different situations, but it is susceptible to changes in environmental dynamics. However, solving Markov Decision Processes that are robust to changes is…

Machine Learning · Computer Science 2024-06-21 Etash Kumar Guha

Towards Principled, Practical Policy Gradient for Bandits and Tabular MDPs

We consider (stochastic) softmax policy gradient (PG) methods for bandits and tabular Markov decision processes (MDPs). While the PG objective is non-concave, recent research has used the objective's smoothness and gradient domination…

Machine Learning · Computer Science 2024-10-01 Michael Lu , Matin Aghaei , Anant Raj , Sharan Vaswani

Policy Regularized Distributionally Robust Markov Decision Processes with Linear Function Approximation

Decision-making under distribution shift is a central challenge in reinforcement learning (RL), where training and deployment environments differ. We study this problem through the lens of robust Markov decision processes (RMDPs), which…

Machine Learning · Computer Science 2025-10-17 Jingwen Gu , Yiting He , Zhishuai Liu , Pan Xu