English
Related papers

Related papers: Robust Policy Optimization with Baseline Guarantee…

200 papers

An important problem in sequential decision-making under uncertainty is to use limited data to compute a safe policy, i.e., a policy that is guaranteed to perform at least as well as a given baseline strategy. In this paper, we develop and…

Machine Learning · Statistics 2016-07-14 Marek Petrik , Yinlam Chow , Mohammad Ghavamzadeh

We address the problem of computing reliable policies in reinforcement learning problems with limited data. In particular, we compute policies that achieve good returns with high confidence when deployed. This objective, known as the…

Machine Learning · Computer Science 2021-03-01 Bahram Behzadian , Reazul Hasan Russel , Marek Petrik , Chin Pang Ho

The ability to compute reward-optimal policies for given and known finite Markov decision processes (MDPs) underpins a variety of applications across planning, controller synthesis, and verification. However, we often want policies (1) to…

Logic in Computer Science · Computer Science 2025-11-18 Linus Heck , Filip Macák , Milan Češka , Sebastian Junges

In this paper, we show how a simulated Markov decision process (MDP) built by the so-called \emph{baseline} policies, can be used to compute a different policy, namely the \emph{simulated optimal} policy, for which the performance of this…

Optimization and Control · Mathematics 2014-10-13 Yinlam Chow , Mohammad Ghavamzadeh

Our goal is to build robust optimization problems for making decisions based on complex data from the past. In robust optimization (RO) generally, the goal is to create a policy for decision-making that is robust to our uncertainty about…

Optimization and Control · Mathematics 2014-07-07 Theja Tulabandhula , Cynthia Rudin

Addressing uncertainty is critical for autonomous systems to robustly adapt to the real world. We formulate the problem of model uncertainty as a continuous Bayes-Adaptive Markov Decision Process (BAMDP), where an agent maintains a…

Previous work has shown the unreliability of existing algorithms in the batch Reinforcement Learning setting, and proposed the theoretically-grounded Safe Policy Improvement with Baseline Bootstrapping (SPIBB) fix: reproduce the baseline…

Machine Learning · Computer Science 2021-01-01 Thiago D. Simão , Romain Laroche , Rémi Tachet des Combes

Constrained decision-making is essential for designing safe policies in real-world control systems, yet simulated environments often fail to capture real-world adversities. We consider the problem of learning a policy that will maximize the…

Machine Learning · Computer Science 2026-02-10 Sourav Ganguly , Kishan Panaganti , Arnob Ghosh , Adam Wierman

Model-based policy optimization is a well-established framework for designing reliable and high-performance controllers across a wide range of control applications. Recently, this approach has been extended to model predictive control…

Systems and Control · Electrical Eng. & Systems 2026-04-15 Riccardo Zuliani , Efe C. Balta , John Lygeros

In this paper, we focus on the problem of robustifying reinforcement learning (RL) algorithms with respect to model uncertainties. Indeed, in the framework of model-based RL, we propose to merge the theory of constrained Markov decision…

Machine Learning · Computer Science 2020-10-13 Reazul Hasan Russel , Mouhacine Benosman , Jeroen Van Baar

Synthesising verifiably correct controllers for dynamical systems is crucial for safety-critical problems. To achieve this, it is important to account for uncertainty in a robust manner, while at the same time it is often of interest to…

Systems and Control · Electrical Eng. & Systems 2024-05-16 Luke Rickard , Alessandro Abate , Kostas Margellos

Robust optimization safeguards decisions against uncertainty by optimizing against worst-case scenarios, yet their effectiveness hinges on a prespecified robustness level that is often chosen ad hoc, leading to either insufficient…

Machine Learning · Statistics 2026-02-02 Wenbin Zhou , Shixiang Zhu

Robust MDPs (RMDPs) can be used to compute policies with provable worst-case guarantees in reinforcement learning. The quality and robustness of an RMDP solution are determined by the ambiguity set---the set of plausible transition…

Machine Learning · Computer Science 2019-02-21 Marek Petrik , Reazul Hasan Russell

Safety in stochastic control systems, which are subject to random noise with a known probability distribution, aims to compute policies that satisfy predefined operational constraints with high confidence throughout the uncertain evolution…

Systems and Control · Electrical Eng. & Systems 2025-11-12 Saber Omidi , Marek Petrik , Se Young Yoon , Momotaz Begum

Safe Policy Improvement (SPI) aims at provable guarantees that a learned policy is at least approximately as good as a given baseline policy. Building on SPI with Soft Baseline Bootstrapping (Soft-SPIBB) by Nadjahi et al., we identify…

Machine Learning · Computer Science 2022-08-02 Philipp Scholl , Felix Dietrich , Clemens Otte , Steffen Udluft

The Robust Markov Decision Process (RMDP) framework focuses on designing control policies that are robust against the parameter uncertainties due to the mismatches between the simulator model and real-world settings. An RMDP problem is…

Machine Learning · Computer Science 2022-05-17 Kishan Panaganti , Dileep Kalathil

Reinforcement learning (RL) has exceeded human performance in many synthetic settings such as video games and Go. However, real-world deployment of end-to-end RL models is less common, as RL models can be very sensitive to slight…

Machine Learning · Computer Science 2022-09-29 Jing Dong , Jingwei Li , Baoxiang Wang , Jingzhao Zhang

The goal of robust constrained reinforcement learning (RL) is to optimize an agent's performance under the worst-case model uncertainty while satisfying safety or resource constraints. In this paper, we demonstrate that strong duality does…

Machine Learning · Computer Science 2025-09-23 Shaocong Ma , Ziyi Chen , Yi Zhou , Heng Huang

Model-based reinforcement learning algorithms tend to achieve higher sample efficiency than model-free methods. However, due to the inevitable errors of learned models, model-based methods struggle to achieve the same asymptotic performance…

Machine Learning · Computer Science 2019-12-02 Qi Zhou , Houqiang Li , Jie Wang

Safety and robustness are two desired properties for any reinforcement learning algorithm. CMDPs can handle additional safety constraints and RMDPs can perform well under model uncertainties. In this paper, we propose to unite these two…

Machine Learning · Computer Science 2021-08-21 Reazul Hasan Russel , Mouhacine Benosman , Jeroen Van Baar , Radu Corcodel
‹ Prev 1 2 3 10 Next ›