Related papers: Bayesian Policy Optimization for Model Uncertainty

Bayes-Adaptive Deep Model-Based Policy Optimisation

We introduce a Bayesian (deep) model-based reinforcement learning method (RoMBRL) that can capture model uncertainty to achieve sample-efficient policy optimisation. We propose to formulate the model-based policy optimisation problem as a…

Robotics · Computer Science 2021-01-06 Tai Hoang , Ngo Anh Vien

Bayes-CPACE: PAC Optimal Exploration in Continuous Space Bayes-Adaptive Markov Decision Processes

We present the first PAC optimal algorithm for Bayes-Adaptive Markov Decision Processes (BAMDPs) in continuous state and action spaces, to the best of our knowledge. The BAMDP framework elegantly addresses model uncertainty by incorporating…

Machine Learning · Computer Science 2018-10-09 Gilwoo Lee , Sanjiban Choudhury , Brian Hou , Siddhartha S. Srinivasa

Bayesian Risk-Sensitive Policy Optimization For MDPs With General Loss Functions

Motivated by many application problems, we consider Markov decision processes (MDPs) with a general loss function and unknown parameters. To mitigate the epistemic uncertainty associated with unknown parameters, we take a Bayesian approach…

Machine Learning · Computer Science 2025-10-02 Xiaoshuang Wang , Yifan Lin , Enlu Zhou

Bayesian learning of the optimal action-value function in a Markov decision process

The Markov Decision Process (MDP) is a popular framework for sequential decision-making problems, and uncertainty quantification is an essential component of it to learn optimal decision-making strategies. In particular, a Bayesian…

Machine Learning · Statistics 2025-05-06 Jiaqi Guo , Chon Wai Ho , Sumeetpal S. Singh

Bayesian Learning of Optimal Policies in Markov Decision Processes with Countably Infinite State-Space

Models of many real-life applications, such as queuing models of communication networks or computing systems, have a countably infinite state-space. Algorithmic and learning procedures that have been developed to produce optimal policies…

Systems and Control · Electrical Eng. & Systems 2024-03-19 Saghar Adler , Vijay Subramanian

Bayesian Residual Policy Optimization: Scalable Bayesian Reinforcement Learning with Clairvoyant Experts

Informed and robust decision making in the face of uncertainty is critical for robots that perform physical tasks alongside people. We formulate this as Bayesian Reinforcement Learning over latent Markov Decision Processes (MDPs). While…

Robotics · Computer Science 2020-02-11 Gilwoo Lee , Brian Hou , Sanjiban Choudhury , Siddhartha S. Srinivasa

Offline Bayesian Aleatoric and Epistemic Uncertainty Quantification and Posterior Value Optimisation in Finite-State MDPs

We address the challenge of quantifying Bayesian uncertainty and incorporating it in offline use cases of finite-state Markov Decision Processes (MDPs) with unknown dynamics. Our approach provides a principled method to disentangle…

Machine Learning · Computer Science 2024-06-05 Filippo Valdettaro , A. Aldo Faisal

Multi-Objective Approaches to Markov Decision Processes with Uncertain Transition Parameters

Markov decision processes (MDPs) are a popular model for performance analysis and optimization of stochastic systems. The parameters of stochastic behavior of MDPs are estimates from empirical observations of a system; their values are not…

Artificial Intelligence · Computer Science 2017-10-26 Dimitri Scheftelowitsch , Peter Buchholz , Vahid Hashemi , Holger Hermanns

MDP Planning as Policy Inference

We cast episodic Markov decision process (MDP) planning as Bayesian inference over policies. A policy is treated as the latent variable and is assigned an unnormalized probability of optimality that is monotone in its expected return,…

Machine Learning · Computer Science 2026-04-14 David Tolpin

UAMDP: Uncertainty-Aware Markov Decision Process for Risk-Constrained Reinforcement Learning from Probabilistic Forecasts

Sequential decisions in volatile, high-stakes settings require more than maximizing expected return; they require principled uncertainty management. This paper presents the Uncertainty-Aware Markov Decision Process (UAMDP), a unified…

Machine Learning · Computer Science 2025-12-19 Michal Koren , Or Peretz , Tai Dinh , Philip S. Yu

Bayesian regularization of empirical MDPs

In most applications of model-based Markov decision processes, the parameters for the unknown underlying model are often estimated from the empirical data. Due to noise, the policy learnedfrom the estimated model is often far from the…

Machine Learning · Computer Science 2022-09-22 Samarth Gupta , Daniel N. Hill , Lexing Ying , Inderjit Dhillon

Myopic Policy Bounds for Information Acquisition POMDPs

This paper addresses the problem of optimal control of robotic sensing systems aimed at autonomous information gathering in scenarios such as environmental monitoring, search and rescue, and surveillance and reconnaissance. The information…

Systems and Control · Computer Science 2016-01-28 Mikko Lauri , Nikolay Atanasov , George J. Pappas , Risto Ritala

An Online Non-Stationary Simulation Optimization Approach Based on Regime Switching

Dynamic and evolving operational and economic environments present significant challenges for decision-making. We explore a simulation optimization problem characterized by non-stationary input distributions with regime-switching dynamics…

Optimization and Control · Mathematics 2025-08-19 Jianglin Xia , Haowei Wang , Songhao Wang , Szu Hui Ng

Sequential Decision Making on Unmatched Data using Bayesian Kernel Embeddings

The problem of sequentially maximizing the expectation of a function seeks to maximize the expected value of a function of interest without having direct control on its features. Instead, the distribution of such features depends on a given…

Machine Learning · Statistics 2022-10-26 Diego Martinez-Taboada , Dino Sejdinovic

Scenario-Based Verification of Uncertain MDPs

We consider Markov decision processes (MDPs) in which the transition probabilities and rewards belong to an uncertainty set parametrized by a collection of random variables. The probability distributions for these random parameters are…

Logic in Computer Science · Computer Science 2020-02-26 Murat Cubuktepe , Nils Jansen , Sebastian Junges , Joost-Pieter Katoen , Ufuk Topcu

Robust Policy Optimization with Baseline Guarantees

Our goal is to compute a policy that guarantees improved return over a baseline policy even when the available MDP model is inaccurate. The inaccurate model may be constructed, for example, by system identification techniques when the true…

Optimization and Control · Mathematics 2015-06-17 Yinlam Chow , Marek Petrik , Mohammad Ghavamzadeh

PODDP: Partially Observable Differential Dynamic Programming for Latent Belief Space Planning

Autonomous agents are limited in their ability to observe the world state. Partially observable Markov decision processes (POMDPs) formally model the problem of planning under world state uncertainty, but POMDPs with continuous actions and…

Robotics · Computer Science 2020-07-08 Dicong Qiu , Yibiao Zhao , Chris L. Baker

Robust Batch Policy Learning in Markov Decision Processes

We study the offline data-driven sequential decision making problem in the framework of Markov decision process (MDP). In order to enhance the generalizability and adaptivity of the learned policy, we propose to evaluate each policy by a…

Statistics Theory · Mathematics 2021-11-11 Zhengling Qi , Peng Liao

A Bayesian Theory of Change Detection in Statistically Periodic Random Processes

A new class of stochastic processes called independent and periodically identically distributed (i.p.i.d.) processes is defined to capture periodically varying statistical behavior. A novel Bayesian theory is developed for detecting a…

Signal Processing · Electrical Eng. & Systems 2019-04-09 Taposh Banerjee , Prudhvi Gurram , Gene Whipps

Bayesian Optimal Control of Smoothly Parameterized Systems: The Lazy Posterior Sampling Algorithm

We study Bayesian optimal control of a general class of smoothly parameterized Markov decision problems. Since computing the optimal control is computationally expensive, we design an algorithm that trades off performance for computational…

Machine Learning · Computer Science 2014-06-17 Yasin Abbasi-Yadkori , Csaba Szepesvari