Related papers: Configurable Markov Decision Processes

Finite-Horizon Markov Decision Processes with Sequentially-Observed Transitions

Markov Decision Processes (MDPs) have been used to formulate many decision-making problems in science and engineering. The objective is to synthesize the best decision (action selection) policies to maximize expected rewards (or minimize…

Optimization and Control · Mathematics 2015-07-07 Mahmoud El Chamie , Behcet Acikmese

Performance Improvement Bounds for Lipschitz Configurable Markov Decision Processes

Configurable Markov Decision Processes (Conf-MDPs) have recently been introduced as an extension of the traditional Markov Decision Processes (MDPs) to model the real-world scenarios in which there is the possibility to intervene in the…

Machine Learning · Computer Science 2024-02-22 Alberto Maria Metelli

Contextual Markov Decision Processes

We consider a planning problem where the dynamics and rewards of the environment depend on a hidden static parameter referred to as the context. The objective is to learn a strategy that maximizes the accumulated reward across all contexts.…

Machine Learning · Statistics 2015-02-10 Assaf Hallak , Dotan Di Castro , Shie Mannor

Multiple-Environment Markov Decision Processes

We introduce Multi-Environment Markov Decision Processes (MEMDPs) which are MDPs with a set of probabilistic transition functions. The goal in a MEMDP is to synthesize a single controller with guaranteed performances against all…

Logic in Computer Science · Computer Science 2014-12-04 Jean-François Raskin , Ocan Sankur

A Minimax-MDP Framework with Future-imposed Conditions for Learning-augmented Problems

We study a class of sequential decision-making problems with augmented predictions, potentially provided by a machine learning algorithm. In this setting, the decision-maker receives prediction intervals for unknown parameters that become…

Machine Learning · Computer Science 2025-05-05 Xin Chen , Yuze Chen , Yuan Zhou

Parameter-Independent Strategies for pMDPs via POMDPs

Markov Decision Processes (MDPs) are a popular class of models suitable for solving control decision problems in probabilistic reactive systems. We consider parametric MDPs (pMDPs) that include parameters in some of the transition…

Logic in Computer Science · Computer Science 2018-06-14 Sebastian Arming , Ezio Bartocci , Krishnendu Chatterjee , Joost-Pieter Katoen , Ana Sokolova

Policy Space Identification in Configurable Environments

We study the problem of identifying the policy space of a learning agent, having access to a set of demonstrations generated by its optimal policy. We introduce an approach based on statistical testing to identify the set of policy…

Machine Learning · Computer Science 2019-09-10 Alberto Maria Metelli , Guglielmo Manneschi , Marcello Restelli

Stepsize Learning for Policy Gradient Methods in Contextual Markov Decision Processes

Policy-based algorithms are among the most widely adopted techniques in model-free RL, thanks to their strong theoretical groundings and good properties in continuous action spaces. Unfortunately, these methods require precise and…

Machine Learning · Computer Science 2023-06-14 Luca Sabbioni , Francesco Corda , Marcello Restelli

Safety-Constrained Policy Transfer with Successor Features

In this work, we focus on the problem of safe policy transfer in reinforcement learning: we seek to leverage existing policies when learning a new task with specified constraints. This problem is important for safety-critical applications…

Machine Learning · Computer Science 2022-11-11 Zeyu Feng , Bowen Zhang , Jianxin Bi , Harold Soh

Multi-Objective Approaches to Markov Decision Processes with Uncertain Transition Parameters

Markov decision processes (MDPs) are a popular model for performance analysis and optimization of stochastic systems. The parameters of stochastic behavior of MDPs are estimates from empirical observations of a system; their values are not…

Artificial Intelligence · Computer Science 2017-10-26 Dimitri Scheftelowitsch , Peter Buchholz , Vahid Hashemi , Holger Hermanns

Data-Efficient Safe Policy Improvement Using Parametric Structure

Safe policy improvement (SPI) is an offline reinforcement learning problem in which a new policy that reliably outperforms the behavior policy with high confidence needs to be computed using only a dataset and the behavior policy. Markov…

Artificial Intelligence · Computer Science 2025-08-20 Kasper Engelen , Guillermo A. Pérez , Marnix Suilen

Reconnaissance and Planning algorithm for constrained MDP

Practical reinforcement learning problems are often formulated as constrained Markov decision process (CMDP) problems, in which the agent has to maximize the expected return while satisfying a set of prescribed safety constraints. In this…

Machine Learning · Computer Science 2019-09-23 Shin-ichi Maeda , Hayato Watahiki , Shintarou Okada , Masanori Koyama

Safe Reinforcement Learning in Constrained Markov Decision Processes

Safe reinforcement learning has been a promising approach for optimizing the policy of an agent that operates in safety-critical applications. In this paper, we propose an algorithm, SNO-MDP, that explores and optimizes Markov decision…

Machine Learning · Computer Science 2020-08-18 Akifumi Wachi , Yanan Sui

Reinforcement Learning in Switching Non-Stationary Markov Decision Processes: Algorithms and Convergence Analysis

Reinforcement learning in non-stationary environments is challenging due to abrupt and unpredictable changes in dynamics, often causing traditional algorithms to fail to converge. However, in many real-world cases, non-stationarity has some…

Machine Learning · Computer Science 2025-03-25 Mohsen Amiri , Sindri Magnússon

MDPs with a State Sensing Cost

In many practical sequential decision-making problems, tracking the state of the environment incurs a sensing/communication/computation cost. In these settings, the agent's interaction with its environment includes the additional component…

Machine Learning · Computer Science 2026-04-16 Vansh Kapoor , Jayakrishnan Nair

Act as You Learn: Adaptive Decision-Making in Non-Stationary Markov Decision Processes

A fundamental (and largely open) challenge in sequential decision-making is dealing with non-stationary environments, where exogenous environmental conditions change over time. Such problems are traditionally modeled as non-stationary…

Artificial Intelligence · Computer Science 2024-01-23 Baiting Luo , Yunuo Zhang , Abhishek Dubey , Ayan Mukhopadhyay

Learning and Planning for Time-Varying MDPs Using Maximum Likelihood Estimation

This paper proposes a formal approach to online learning and planning for agents operating in a priori unknown, time-varying environments. The proposed method computes the maximally likely model of the environment, given the observations…

Machine Learning · Computer Science 2021-02-09 Melkior Ornik , Ufuk Topcu

Beyond Stationarity: Convergence Analysis of Stochastic Softmax Policy Gradient Methods

Markov Decision Processes (MDPs) are a formal framework for modeling and solving sequential decision-making problems. In finite-time horizons such problems are relevant for instance for optimal stopping or specific supply chain problems,…

Optimization and Control · Mathematics 2024-05-07 Sara Klein , Simon Weissmann , Leif Döring

Constrained Active Classification Using Partially Observable Markov Decision Processes

In this work, we study the problem of actively classifying the attributes of dynamical systems characterized as a finite set of Markov decision process (MDP) models. We are interested in finding strategies that actively interact with the…

Systems and Control · Electrical Eng. & Systems 2023-01-06 Bo Wu , Niklas Lauffer , Mohamadreza Ahmadi , Suda Bharadwaj , Zhe Xu , Ufuk Topcu

Time-Optimal Navigation in Uncertain Environments with High-Level Specifications

Mixed observable Markov decision processes (MOMDPs) are a modeling framework for autonomous systems described by both fully and partially observable states. In this work, we study the problem of synthesizing a control policy for MOMDPs that…

Systems and Control · Electrical Eng. & Systems 2021-03-03 Ugo Rosolia , Mohamadreza Ahmadi , Richard M. Murray , Aaron D. Ames