Related papers: Policy Gradient for Rectangular Robust Markov Deci…

Policy Gradient in Robust MDPs with Global Convergence Guarantee

Robust Markov decision processes (RMDPs) provide a promising framework for computing reliable policies in the face of model errors. Many successful reinforcement learning algorithms build on variations of policy-gradient methods, but…

Machine Learning · Computer Science 2024-05-15 Qiuhao Wang , Chin Pang Ho , Marek Petrik

A Single-Loop Robust Policy Gradient Method for Robust Markov Decision Processes

Robust Markov Decision Processes (RMDPs) have recently been recognized as a valuable and promising approach to discovering a policy with creditable performance, particularly in the presence of a dynamic environment and estimation errors in…

Optimization and Control · Mathematics 2024-06-04 Zhenwei Lin , Chenyu Xue , Qi Deng , Yinyu Ye

Ranking Policy Gradient

Sample inefficiency is a long-lasting problem in reinforcement learning (RL). The state-of-the-art estimates the optimal action values while it usually involves an extensive search over the state-action space and unstable optimization.…

Machine Learning · Computer Science 2019-11-27 Kaixiang Lin , Jiayu Zhou

Sample Complexity of Robust Reinforcement Learning with a Generative Model

The Robust Markov Decision Process (RMDP) framework focuses on designing control policies that are robust against the parameter uncertainties due to the mismatches between the simulator model and real-world settings. An RMDP problem is…

Machine Learning · Computer Science 2022-05-17 Kishan Panaganti , Dileep Kalathil

Robust Lagrangian and Adversarial Policy Gradient for Robust Constrained Markov Decision Processes

The robust constrained Markov decision process (RCMDP) is a recent task-modelling framework for reinforcement learning that incorporates behavioural constraints and that provides robustness to errors in the transition dynamics model through…

Machine Learning · Computer Science 2024-05-16 David M. Bossens

Policy Gradient Algorithms for Robust MDPs with Non-Rectangular Uncertainty Sets

We propose policy gradient algorithms for robust infinite-horizon Markov decision processes (MDPs) with non-rectangular uncertainty sets, thereby addressing an open challenge in the robust MDP literature. Indeed, uncertainty sets that…

Optimization and Control · Mathematics 2025-09-30 Mengmeng Li , Daniel Kuhn , Tobias Sutter

Policy Gradient for Robust Markov Decision Processes

We develop a generic policy gradient method with the global optimality guarantee for robust Markov Decision Processes (MDPs). While policy gradient methods are widely used for solving dynamic decision problems due to their scalable and…

Machine Learning · Computer Science 2024-11-01 Qiuhao Wang , Shaohang Xu , Chin Pang Ho , Marek Petrik

Mixed Policy Gradient: off-policy reinforcement learning driven jointly by data and model

Reinforcement learning (RL) shows great potential in sequential decision-making. At present, mainstream RL algorithms are data-driven, which usually yield better asymptotic performance but much slower convergence compared with model-driven…

Machine Learning · Computer Science 2024-02-27 Yang Guan , Jingliang Duan , Shengbo Eben Li , Jie Li , Jianyu Chen , Bo Cheng

Robust Constrained-MDPs: Soft-Constrained Robust Policy Optimization under Model Uncertainty

In this paper, we focus on the problem of robustifying reinforcement learning (RL) algorithms with respect to model uncertainties. Indeed, in the framework of model-based RL, we propose to merge the theory of constrained Markov decision…

Machine Learning · Computer Science 2020-10-13 Reazul Hasan Russel , Mouhacine Benosman , Jeroen Van Baar

Policy Gradient Method For Robust Reinforcement Learning

This paper develops the first policy gradient method with global optimality guarantee and complexity analysis for robust reinforcement learning under model mismatch. Robust reinforcement learning is to learn a policy robust to model…

Machine Learning · Computer Science 2022-05-17 Yue Wang , Shaofeng Zou

First-order Policy Optimization for Robust Markov Decision Process

We consider the problem of solving robust Markov decision process (MDP), which involves a set of discounted, finite state, finite action space MDPs with uncertain transition kernels. The goal of planning is to find a robust policy that…

Machine Learning · Computer Science 2023-06-13 Yan Li , Guanghui Lan , Tuo Zhao

Policy Learning for Robust Markov Decision Process with a Mismatched Generative Model

In high-stake scenarios like medical treatment and auto-piloting, it's risky or even infeasible to collect online experimental data to train the agent. Simulation-based training can alleviate this issue, but may suffer from its inherent…

Machine Learning · Computer Science 2022-03-16 Jialian Li , Tongzheng Ren , Dong Yan , Hang Su , Jun Zhu

A policy gradient approach for optimization of smooth risk measures

We propose policy gradient algorithms for solving a risk-sensitive reinforcement learning (RL) problem in on-policy as well as off-policy settings. We consider episodic Markov decision processes, and model the risk using the broad class of…

Machine Learning · Computer Science 2024-06-25 Nithia Vijayan , Prashanth L. A

Policy Gradient Methods for Distortion Risk Measures

We propose policy gradient algorithms which learn risk-sensitive policies in a reinforcement learning (RL) framework. Our proposed algorithms maximize the distortion risk measure (DRM) of the cumulative reward in an episodic Markov decision…

Machine Learning · Computer Science 2024-02-06 Nithia Vijayan , Prashanth L. A

Reparameterized Policy Learning for Multimodal Trajectory Optimization

We investigate the challenge of parametrizing policies for reinforcement learning (RL) in high-dimensional continuous action spaces. Our objective is to develop a multimodal policy that overcomes limitations inherent in the commonly-used…

Machine Learning · Computer Science 2023-07-21 Zhiao Huang , Litian Liang , Zhan Ling , Xuanlin Li , Chuang Gan , Hao Su

Policy Gradient for Continuing Tasks in Non-stationary Markov Decision Processes

Reinforcement learning considers the problem of finding policies that maximize an expected cumulative reward in a Markov decision process with unknown transition probabilities. In this paper we consider the problem of finding optimal…

Machine Learning · Computer Science 2020-10-19 Santiago Paternain , Juan Andres Bazerque , Alejandro Ribeiro

Efficient Policy Optimization in Robust Constrained MDPs with Iteration Complexity Guarantees

Constrained decision-making is essential for designing safe policies in real-world control systems, yet simulated environments often fail to capture real-world adversities. We consider the problem of learning a policy that will maximize the…

Machine Learning · Computer Science 2026-02-10 Sourav Ganguly , Kishan Panaganti , Arnob Ghosh , Adam Wierman

Efficient Policy Iteration for Robust Markov Decision Processes via Regularization

Robust Markov decision processes (MDPs) provide a general framework to model decision problems where the system dynamics are changing or only partially known. Efficient methods for some \texttt{sa}-rectangular robust MDPs exist, using its…

Artificial Intelligence · Computer Science 2022-10-06 Navdeep Kumar , Kfir Levy , Kaixin Wang , Shie Mannor

Policy Gradient Algorithms with Monte Carlo Tree Learning for Non-Markov Decision Processes

Policy gradient (PG) is a reinforcement learning (RL) approach that optimizes a parameterized policy model for an expected return using gradient ascent. While PG can work well even in non-Markovian environments, it may encounter plateaus or…

Machine Learning · Computer Science 2024-07-08 Tetsuro Morimura , Kazuhiro Ota , Kenshi Abe , Peinan Zhang

Towards Provable Log Density Policy Gradient

Policy gradient methods are a vital ingredient behind the success of modern reinforcement learning. Modern policy gradient methods, although successful, introduce a residual error in gradient estimation. In this work, we argue that this…

Machine Learning · Computer Science 2024-03-05 Pulkit Katdare , Anant Joshi , Katherine Driggs-Campbell