Related papers: Explainable Deterministic MDPs

Memoryless Exact Solutions for Deterministic MDPs with Sparse Rewards

We propose an algorithm for deterministic continuous Markov Decision Processes with sparse rewards that computes the optimal policy exactly with no dependency on the size of the state space. The algorithm has time complexity of $O( |R|^3…

Machine Learning · Computer Science 2018-05-21 Joshua R. Bertram , Peng Wei

Reinforcement Learning in Reward-Mixing MDPs

Learning a near optimal policy in a partially observable system remains an elusive challenge in contemporary reinforcement learning. In this work, we consider episodic reinforcement learning in a reward-mixing Markov decision process (MDP).…

Machine Learning · Computer Science 2022-02-01 Jeongyeol Kwon , Yonathan Efroni , Constantine Caramanis , Shie Mannor

Sufficiency of Deterministic Policies for Atomless Discounted and Uniformly Absorbing MDPs with Multiple Criteria

This paper studies Markov Decision Processes (MDPs) with atomless initial state distributions and atomless transition probabilities. Such MDPs are called atomless. The initial state distribution is considered to be fixed. We show that for…

Optimization and Control · Mathematics 2018-10-26 Eugene A. Feinberg , Aleksey B. Piunovskiy

Tackling Decision Processes with Non-Cumulative Objectives using Reinforcement Learning

Markov decision processes (MDPs) are used to model a wide variety of applications ranging from game playing over robotics to finance. Their optimal policy typically maximizes the expected sum of rewards given at each step of the decision…

Machine Learning · Computer Science 2025-05-26 Maximilian Nägele , Jan Olle , Thomas Fösel , Remmy Zen , Florian Marquardt

Reinforcement Learning of Markov Decision Processes with Peak Constraints

In this paper, we consider reinforcement learning of Markov Decision Processes (MDP) with peak constraints, where an agent chooses a policy to optimize an objective and at the same time satisfy additional constraints. The agent has to take…

Optimization and Control · Mathematics 2019-12-09 Ather Gattami

Fast Online Exact Solutions for Deterministic MDPs with Sparse Rewards

Markov Decision Processes (MDPs) are a mathematical framework for modeling sequential decision making under uncertainty. The classical approaches for solving MDPs are well known and have been widely studied, some of which rely on…

Machine Learning · Computer Science 2018-05-18 Joshua R. Bertram , Xuxi Yang , Peng Wei

Multi-Objective Approaches to Markov Decision Processes with Uncertain Transition Parameters

Markov decision processes (MDPs) are a popular model for performance analysis and optimization of stochastic systems. The parameters of stochastic behavior of MDPs are estimates from empirical observations of a system; their values are not…

Artificial Intelligence · Computer Science 2017-10-26 Dimitri Scheftelowitsch , Peter Buchholz , Vahid Hashemi , Holger Hermanns

Anytime State-Based Solution Methods for Decision Processes with non-Markovian Rewards

A popular approach to solving a decision process with non-Markovian rewards (NMRDP) is to exploit a compact representation of the reward function to automatically translate the NMRDP into an equivalent Markov decision process (MDP) amenable…

Artificial Intelligence · Computer Science 2013-01-07 Sylvie Thiebaux , Froduald Kabanza , John Slanley

Optimizing Quantiles in Preference-based Markov Decision Processes

In the Markov decision process model, policies are usually evaluated by expected cumulative rewards. As this decision criterion is not always suitable, we propose in this paper an algorithm for computing a policy optimal for the quantile…

Artificial Intelligence · Computer Science 2016-12-02 Hugo Gilbert , Paul Weng , Yan Xu

Geometry and Determinism of Optimal Stationary Control in Partially Observable Markov Decision Processes

It is well known that for any finite state Markov decision process (MDP) there is a memoryless deterministic policy that maximizes the expected reward. For partially observable Markov decision processes (POMDPs), optimal memoryless policies…

Optimization and Control · Mathematics 2016-02-16 Guido Montufar , Keyan Ghazi-Zahedi , Nihat Ay

J-P: MDP. FP. PP.: Characterizing Total Expected Rewards in Markov Decision Processes as Least Fixed Points with an Application to Operational Semantics of Probabilistic Programs (Technical Report)

Markov decision processes (MDPs) with rewards are a widespread and well-studied model for systems that make both probabilistic and nondeterministic choices. A fundamental result about MDPs is that their minimal and maximal expected rewards…

Logic in Computer Science · Computer Science 2024-11-26 Kevin Batz , Benjamin Lucien Kaminski , Christoph Matheja , Tobias Winkler

Finite-Horizon Markov Decision Processes with State Constraints

Markov Decision Processes (MDPs) have been used to formulate many decision-making problems in science and engineering. The objective is to synthesize the best decision (action selection) policies to maximize expected rewards (minimize…

Optimization and Control · Mathematics 2015-07-08 Mahmoud El Chamie , Behcet Acikmese

Efficient Policy Optimization in Robust Constrained MDPs with Iteration Complexity Guarantees

Constrained decision-making is essential for designing safe policies in real-world control systems, yet simulated environments often fail to capture real-world adversities. We consider the problem of learning a policy that will maximize the…

Machine Learning · Computer Science 2026-02-10 Sourav Ganguly , Kishan Panaganti , Arnob Ghosh , Adam Wierman

MDP Geometry, Normalization and Reward Balancing Solvers

We present a new geometric interpretation of Markov Decision Processes (MDPs) with a natural normalization procedure that allows us to adjust the value function at each state without altering the advantage of any action with respect to any…

Machine Learning · Computer Science 2025-03-06 Arsenii Mustafin , Aleksei Pakharev , Alex Olshevsky , Ioannis Ch. Paschalidis

Small Decision Trees for MDPs with Deductive Synthesis

Markov decision processes (MDPs) describe sequential decision-making processes; MDP policies return for every state in that process an advised action. Classical algorithms can efficiently compute policies that are optimal with respect to,…

Logic in Computer Science · Computer Science 2025-05-23 Roman Andriushchenko , Milan Češka , Sebastian Junges , Filip Macák

Solving Multi-Objective MDP with Lexicographic Preference: An application to stochastic planning with multiple quantile objective

In most common settings of Markov Decision Process (MDP), an agent evaluate a policy based on expectation of (discounted) sum of rewards. However in many applications this criterion might not be suitable from two perspective: first, in risk…

Artificial Intelligence · Computer Science 2017-05-11 Yan Li , Zhaohan Sun

Reward-Mixing MDPs with a Few Latent Contexts are Learnable

We consider episodic reinforcement learning in reward-mixing Markov decision processes (RMMDPs): at the beginning of every episode nature randomly picks a latent reward model among $M$ candidates and an agent interacts with the MDP…

Machine Learning · Computer Science 2022-10-07 Jeongyeol Kwon , Yonathan Efroni , Constantine Caramanis , Shie Mannor

Bayesian Learning of Optimal Policies in Markov Decision Processes with Countably Infinite State-Space

Models of many real-life applications, such as queuing models of communication networks or computing systems, have a countably infinite state-space. Algorithmic and learning procedures that have been developed to produce optimal policies…

Systems and Control · Electrical Eng. & Systems 2024-03-19 Saghar Adler , Vijay Subramanian

Policy Dispersion in Non-Markovian Environment

Markov Decision Process (MDP) presents a mathematical framework to formulate the learning processes of agents in reinforcement learning. MDP is limited by the Markovian assumption that a reward only depends on the immediate state and…

Machine Learning · Computer Science 2024-06-04 Bohao Qu , Xiaofeng Cao , Jielong Yang , Hechang Chen , Chang Yi , Ivor W. Tsang , Yew-Soon Ong

Solving POMDPs by Searching the Space of Finite Policies

Solving partially observable Markov decision processes (POMDPs) is highly intractable in general, at least in part because the optimal policy may be infinitely large. In this paper, we explore the problem of finding the optimal policy from…

Artificial Intelligence · Computer Science 2013-01-30 Nicolas Meuleau , Kee-Eung Kim , Leslie Pack Kaelbling , Anthony R. Cassandra