Related papers: Bayesian Policy Optimization for Model Uncertainty
We introduce a Bayesian (deep) model-based reinforcement learning method (RoMBRL) that can capture model uncertainty to achieve sample-efficient policy optimisation. We propose to formulate the model-based policy optimisation problem as a…
We present the first PAC optimal algorithm for Bayes-Adaptive Markov Decision Processes (BAMDPs) in continuous state and action spaces, to the best of our knowledge. The BAMDP framework elegantly addresses model uncertainty by incorporating…
Motivated by many application problems, we consider Markov decision processes (MDPs) with a general loss function and unknown parameters. To mitigate the epistemic uncertainty associated with unknown parameters, we take a Bayesian approach…
The Markov Decision Process (MDP) is a popular framework for sequential decision-making problems, and uncertainty quantification is an essential component of it to learn optimal decision-making strategies. In particular, a Bayesian…
Models of many real-life applications, such as queuing models of communication networks or computing systems, have a countably infinite state-space. Algorithmic and learning procedures that have been developed to produce optimal policies…
Informed and robust decision making in the face of uncertainty is critical for robots that perform physical tasks alongside people. We formulate this as Bayesian Reinforcement Learning over latent Markov Decision Processes (MDPs). While…
We address the challenge of quantifying Bayesian uncertainty and incorporating it in offline use cases of finite-state Markov Decision Processes (MDPs) with unknown dynamics. Our approach provides a principled method to disentangle…
Markov decision processes (MDPs) are a popular model for performance analysis and optimization of stochastic systems. The parameters of stochastic behavior of MDPs are estimates from empirical observations of a system; their values are not…
We cast episodic Markov decision process (MDP) planning as Bayesian inference over policies. A policy is treated as the latent variable and is assigned an unnormalized probability of optimality that is monotone in its expected return,…
Sequential decisions in volatile, high-stakes settings require more than maximizing expected return; they require principled uncertainty management. This paper presents the Uncertainty-Aware Markov Decision Process (UAMDP), a unified…
In most applications of model-based Markov decision processes, the parameters for the unknown underlying model are often estimated from the empirical data. Due to noise, the policy learnedfrom the estimated model is often far from the…
This paper addresses the problem of optimal control of robotic sensing systems aimed at autonomous information gathering in scenarios such as environmental monitoring, search and rescue, and surveillance and reconnaissance. The information…
Dynamic and evolving operational and economic environments present significant challenges for decision-making. We explore a simulation optimization problem characterized by non-stationary input distributions with regime-switching dynamics…
The problem of sequentially maximizing the expectation of a function seeks to maximize the expected value of a function of interest without having direct control on its features. Instead, the distribution of such features depends on a given…
We consider Markov decision processes (MDPs) in which the transition probabilities and rewards belong to an uncertainty set parametrized by a collection of random variables. The probability distributions for these random parameters are…
Our goal is to compute a policy that guarantees improved return over a baseline policy even when the available MDP model is inaccurate. The inaccurate model may be constructed, for example, by system identification techniques when the true…
Autonomous agents are limited in their ability to observe the world state. Partially observable Markov decision processes (POMDPs) formally model the problem of planning under world state uncertainty, but POMDPs with continuous actions and…
We study the offline data-driven sequential decision making problem in the framework of Markov decision process (MDP). In order to enhance the generalizability and adaptivity of the learned policy, we propose to evaluate each policy by a…
A new class of stochastic processes called independent and periodically identically distributed (i.p.i.d.) processes is defined to capture periodically varying statistical behavior. A novel Bayesian theory is developed for detecting a…
We study Bayesian optimal control of a general class of smoothly parameterized Markov decision problems. Since computing the optimal control is computationally expensive, we design an algorithm that trades off performance for computational…