Related papers: Robust Policy Optimization with Baseline Guarantee…

Safe Policy Improvement by Minimizing Robust Baseline Regret

An important problem in sequential decision-making under uncertainty is to use limited data to compute a safe policy, i.e., a policy that is guaranteed to perform at least as well as a given baseline strategy. In this paper, we develop and…

Machine Learning · Statistics 2016-07-14 Marek Petrik , Yinlam Chow , Mohammad Ghavamzadeh

Optimizing Percentile Criterion Using Robust MDPs

We address the problem of computing reliable policies in reinforcement learning problems with limited data. In particular, we compute policies that achieve good returns with high confidence when deployed. This objective, known as the…

Machine Learning · Computer Science 2021-03-01 Bahram Behzadian , Reazul Hasan Russel , Marek Petrik , Chin Pang Ho

Constrained and Robust Policy Synthesis with Satisfiability-Modulo-Probabilistic-Model-Checking

The ability to compute reward-optimal policies for given and known finite Markov decision processes (MDPs) underpins a variety of applications across planning, controller synthesis, and verification. However, we often want policies (1) to…

Logic in Computer Science · Computer Science 2025-11-18 Linus Heck , Filip Macák , Milan Češka , Sebastian Junges

Constrained Stochastic Optimal Control with a Baseline Performance Guarantee

In this paper, we show how a simulated Markov decision process (MDP) built by the so-called \emph{baseline} policies, can be used to compute a different policy, namely the \emph{simulated optimal} policy, for which the performance of this…

Optimization and Control · Mathematics 2014-10-13 Yinlam Chow , Mohammad Ghavamzadeh

Robust Optimization using Machine Learning for Uncertainty Sets

Our goal is to build robust optimization problems for making decisions based on complex data from the past. In robust optimization (RO) generally, the goal is to create a policy for decision-making that is robust to our uncertainty about…

Optimization and Control · Mathematics 2014-07-07 Theja Tulabandhula , Cynthia Rudin

Bayesian Policy Optimization for Model Uncertainty

Addressing uncertainty is critical for autonomous systems to robustly adapt to the real world. We formulate the problem of model uncertainty as a continuous Bayes-Adaptive Markov Decision Process (BAMDP), where an agent maintains a…

Robotics · Computer Science 2019-05-09 Gilwoo Lee , Brian Hou , Aditya Mandalika , Jeongseok Lee , Sanjiban Choudhury , Siddhartha S. Srinivasa

Safe Policy Improvement with an Estimated Baseline Policy

Previous work has shown the unreliability of existing algorithms in the batch Reinforcement Learning setting, and proposed the theoretically-grounded Safe Policy Improvement with Baseline Bootstrapping (SPIBB) fix: reproduce the baseline…

Machine Learning · Computer Science 2021-01-01 Thiago D. Simão , Romain Laroche , Rémi Tachet des Combes

Efficient Policy Optimization in Robust Constrained MDPs with Iteration Complexity Guarantees

Constrained decision-making is essential for designing safe policies in real-world control systems, yet simulated environments often fail to capture real-world adversities. We consider the problem of learning a policy that will maximize the…

Machine Learning · Computer Science 2026-02-10 Sourav Ganguly , Kishan Panaganti , Arnob Ghosh , Adam Wierman

Policy Optimization with Differentiable MPC: Convergence Analysis under Uncertainty

Model-based policy optimization is a well-established framework for designing reliable and high-performance controllers across a wide range of control applications. Recently, this approach has been extended to model predictive control…

Systems and Control · Electrical Eng. & Systems 2026-04-15 Riccardo Zuliani , Efe C. Balta , John Lygeros

Robust Constrained-MDPs: Soft-Constrained Robust Policy Optimization under Model Uncertainty

In this paper, we focus on the problem of robustifying reinforcement learning (RL) algorithms with respect to model uncertainties. Indeed, in the framework of model-based RL, we propose to merge the theory of constrained Markov decision…

Machine Learning · Computer Science 2020-10-13 Reazul Hasan Russel , Mouhacine Benosman , Jeroen Van Baar

Learning Robust Policies for Uncertain Parametric Markov Decision Processes

Synthesising verifiably correct controllers for dynamical systems is crucial for safety-critical problems. To achieve this, it is important to account for uncertainty in a robust manner, while at the same time it is often of interest to…

Systems and Control · Electrical Eng. & Systems 2024-05-16 Luke Rickard , Alessandro Abate , Kostas Margellos

Calibrating Decision Robustness via Inverse Conformal Risk Control

Robust optimization safeguards decisions against uncertainty by optimizing against worst-case scenarios, yet their effectiveness hinges on a prespecified robustness level that is often chosen ad hoc, leading to either insufficient…

Machine Learning · Statistics 2026-02-02 Wenbin Zhou , Shixiang Zhu

Beyond Confidence Regions: Tight Bayesian Ambiguity Sets for Robust MDPs

Robust MDPs (RMDPs) can be used to compute policies with provable worst-case guarantees in reinforcement learning. The quality and robustness of an RMDP solution are determined by the ambiguity set---the set of plausible transition…

Machine Learning · Computer Science 2019-02-21 Marek Petrik , Reazul Hasan Russell

Probabilistic Safety Guarantee for Stochastic Control Systems Using Average Reward MDPs

Safety in stochastic control systems, which are subject to random noise with a known probability distribution, aims to compute policies that satisfy predefined operational constraints with high confidence throughout the uncertain evolution…

Systems and Control · Electrical Eng. & Systems 2025-11-12 Saber Omidi , Marek Petrik , Se Young Yoon , Momotaz Begum

Safe Policy Improvement Approaches on Discrete Markov Decision Processes

Safe Policy Improvement (SPI) aims at provable guarantees that a learned policy is at least approximately as good as a given baseline policy. Building on SPI with Soft Baseline Bootstrapping (Soft-SPIBB) by Nadjahi et al., we identify…

Machine Learning · Computer Science 2022-08-02 Philipp Scholl , Felix Dietrich , Clemens Otte , Steffen Udluft

Sample Complexity of Robust Reinforcement Learning with a Generative Model

The Robust Markov Decision Process (RMDP) framework focuses on designing control policies that are robust against the parameter uncertainties due to the mismatches between the simulator model and real-world settings. An RMDP problem is…

Machine Learning · Computer Science 2022-05-17 Kishan Panaganti , Dileep Kalathil

Online Policy Optimization for Robust MDP

Reinforcement learning (RL) has exceeded human performance in many synthetic settings such as video games and Go. However, real-world deployment of end-to-end RL models is less common, as RL models can be very sensitive to slight…

Machine Learning · Computer Science 2022-09-29 Jing Dong , Jingwei Li , Baoxiang Wang , Jingzhao Zhang

Rectified Robust Policy Optimization for Model-Uncertain Constrained Reinforcement Learning without Strong Duality

The goal of robust constrained reinforcement learning (RL) is to optimize an agent's performance under the worst-case model uncertainty while satisfying safety or resource constraints. In this paper, we demonstrate that strong duality does…

Machine Learning · Computer Science 2025-09-23 Shaocong Ma , Ziyi Chen , Yi Zhou , Heng Huang

Deep Model-Based Reinforcement Learning via Estimated Uncertainty and Conservative Policy Optimization

Model-based reinforcement learning algorithms tend to achieve higher sample efficiency than model-free methods. However, due to the inevitable errors of learned models, model-based methods struggle to achieve the same asymptotic performance…

Machine Learning · Computer Science 2019-12-02 Qi Zhou , Houqiang Li , Jie Wang

Lyapunov Robust Constrained-MDPs: Soft-Constrained Robustly Stable Policy Optimization under Model Uncertainty

Safety and robustness are two desired properties for any reinforcement learning algorithm. CMDPs can handle additional safety constraints and RMDPs can perform well under model uncertainties. In this paper, we propose to unite these two…

Machine Learning · Computer Science 2021-08-21 Reazul Hasan Russel , Mouhacine Benosman , Jeroen Van Baar , Radu Corcodel