Related papers: Provably Efficient UCB-type Algorithms For Learnin…

Causal Markov Decision Processes: Learning Good Interventions Efficiently

We introduce causal Markov Decision Processes (C-MDPs), a new formalism for sequential decision making which combines the standard MDP formulation with causal structures over state transition and reward functions. Many contemporary and…

Machine Learning · Statistics 2021-02-16 Yangyi Lu , Amirhossein Meisami , Ambuj Tewari

Robust Anytime Learning of Markov Decision Processes

Markov decision processes (MDPs) are formal models commonly used in sequential decision-making. MDPs capture the stochasticity that may arise, for instance, from imprecise actuators via probabilities in the transition function. However, in…

Artificial Intelligence · Computer Science 2023-06-21 Marnix Suilen , Thiago D. Simão , David Parker , Nils Jansen

Robust Batch Policy Learning in Markov Decision Processes

We study the offline data-driven sequential decision making problem in the framework of Markov decision process (MDP). In order to enhance the generalizability and adaptivity of the learned policy, we propose to evaluate each policy by a…

Statistics Theory · Mathematics 2021-11-11 Zhengling Qi , Peng Liao

Sample-Efficient Reinforcement Learning of Undercomplete POMDPs

Partial observability is a common challenge in many reinforcement learning applications, which requires an agent to maintain memory, infer latent states, and integrate this past information into exploration. This challenge leads to a number…

Machine Learning · Computer Science 2020-10-27 Chi Jin , Sham M. Kakade , Akshay Krishnamurthy , Qinghua Liu

UAMDP: Uncertainty-Aware Markov Decision Process for Risk-Constrained Reinforcement Learning from Probabilistic Forecasts

Sequential decisions in volatile, high-stakes settings require more than maximizing expected return; they require principled uncertainty management. This paper presents the Uncertainty-Aware Markov Decision Process (UAMDP), a unified…

Machine Learning · Computer Science 2025-12-19 Michal Koren , Or Peretz , Tai Dinh , Philip S. Yu

Accelerating Point-Based Value Iteration via Active Sampling of Belief Points and Gaussian Process Regression

Partially Observable Markov Decision Processes (POMDPs) are fundamental to decision-making under uncertainty. We introduce a novel scalable approach to accelerate upper bound estimation in Point-Based Value Iteration (PBVI) algorithms, the…

Optimization and Control · Mathematics 2025-03-13 Siqiong Zhou , Ashif S. Iquebal , Esma S. Gel

Solving Robust Markov Decision Processes: Generic, Reliable, Efficient

Markov decision processes (MDP) are a well-established model for sequential decision-making in the presence of probabilities. In robust MDP (RMDP), every action is associated with an uncertainty set of probability distributions, modelling…

Artificial Intelligence · Computer Science 2024-12-16 Tobias Meggendorfer , Maximilian Weininger , Patrick Wienhöft

Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning

In view of its power in extracting feature representation, contrastive self-supervised learning has been successfully integrated into the practice of (deep) reinforcement learning (RL), leading to efficient policy learning in various…

Machine Learning · Computer Science 2024-04-16 Shuang Qiu , Lingxiao Wang , Chenjia Bai , Zhuoran Yang , Zhaoran Wang

Sequential Monte Carlo for Policy Optimization in Continuous POMDPs

Optimal decision-making under partial observability requires agents to balance reducing uncertainty (exploration) against pursuing immediate objectives (exploitation). In this paper, we introduce a novel policy optimization framework for…

Machine Learning · Computer Science 2025-12-05 Hany Abdulsamad , Sahel Iqbal , Simo Särkkä

Distributionally Robust Optimization for Sequential Decision Making

The distributionally robust Markov Decision Process (MDP) approach asks for a distributionally robust policy that achieves the maximal expected total reward under the most adversarial distribution of uncertain parameters. In this paper, we…

Systems and Control · Computer Science 2018-10-10 Zhi Chen , Pengqian Yu , William B. Haskell

Provable Representation with Efficient Planning for Partial Observable Reinforcement Learning

In most real-world reinforcement learning applications, state information is only partially observable, which breaks the Markov decision process assumption and leads to inferior performance for algorithms that conflate observations with…

Machine Learning · Computer Science 2024-06-12 Hongming Zhang , Tongzheng Ren , Chenjun Xiao , Dale Schuurmans , Bo Dai

Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes

Markov decision processes (MDPs) are the defacto frame-work for sequential decision making in the presence ofstochastic uncertainty. A classical optimization criterion forMDPs is to maximize the expected discounted-sum pay-off, which…

Artificial Intelligence · Computer Science 2020-02-28 Tomas Brazdil , Krishnendu Chatterjee , Petr Novotny , Jiri Vahala

PAC Bounds for Discounted MDPs

We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finite-state discounted Markov Decision Processes (MDPs). For the upper bound we make the assumption that each action leads to at most two…

Machine Learning · Computer Science 2013-05-17 Tor Lattimore , Marcus Hutter

On the Complexity of Discounted Robust MDPs with $L_p$ Uncertainty Sets

A basic model in sequential decision making is the Markov decision process (MDP), which is extended to Robust MDPs (RMDPs) by allowing uncertainty in transition probabilities and optimizing against the worst-case transition probabilities…

Computational Complexity · Computer Science 2026-05-11 Ali Asadi , Krishnendu Chatterjee , Alipasha Montaseri , Ali Shafiee

Model-Based Learning of Near-Optimal Finite-Window Policies in POMDPs

We study model-based learning of finite-window policies in tabular partially observable Markov decision processes (POMDPs). A common approach to learning under partial observability is to approximate unbounded history dependencies using…

Machine Learning · Computer Science 2026-04-02 Philip Jordan , Maryam Kamgarpour

Robust Finite-State Controllers for Uncertain POMDPs

Uncertain partially observable Markov decision processes (uPOMDPs) allow the probabilistic transition and observation functions of standard POMDPs to belong to a so-called uncertainty set. Such uncertainty, referred to as epistemic…

Artificial Intelligence · Computer Science 2021-11-02 Murat Cubuktepe , Nils Jansen , Sebastian Junges , Ahmadreza Marandi , Marnix Suilen , Ufuk Topcu

Efficient Strategy Iteration for Mean Payoff in Markov Decision Processes

Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Mean payoff (or long-run average reward) provides a mathematically elegant formalism to express performance related…

Performance · Computer Science 2017-09-08 Jan Křetínský , Tobias Meggendorfer

Reinforcement Learning: a Comparison of UCB Versus Alternative Adaptive Policies

In this paper we consider the basic version of Reinforcement Learning (RL) that involves computing optimal data driven (adaptive) policies for Markovian decision process with unknown transition probabilities. We provide a brief survey of…

Machine Learning · Computer Science 2019-09-16 Wesley Cowan , Michael N. Katehakis , Daniel Pirutinsky

PAC Reinforcement Learning for Predictive State Representations

In this paper we study online Reinforcement Learning (RL) in partially observable dynamical systems. We focus on the Predictive State Representations (PSRs) model, which is an expressive model that captures other well-known models such as…

Machine Learning · Computer Science 2022-08-16 Wenhao Zhan , Masatoshi Uehara , Wen Sun , Jason D. Lee

Scalable First-Order Methods for Robust MDPs

Robust Markov Decision Processes (MDPs) are a powerful framework for modeling sequential decision-making problems with model uncertainty. This paper proposes the first first-order framework for solving robust MDPs. Our algorithm interleaves…

Optimization and Control · Mathematics 2021-01-18 Julien Grand-Clément , Christian Kroer