Related papers: Contextual Markov Decision Processes

A Best-of-Both-Worlds Algorithm for Constrained MDPs with Long-Term Constraints

We study online learning in episodic constrained Markov decision processes (CMDPs), where the learner aims at collecting as much reward as possible over the episodes, while satisfying some long-term constraints during the learning process.…

Machine Learning · Computer Science 2024-08-30 Jacopo Germano , Francesco Emanuele Stradi , Gianmarco Genalti , Matteo Castiglioni , Alberto Marchesi , Nicola Gatti

Multi-Objective Planning with Contextual Lexicographic Reward Preferences

Autonomous agents are often required to plan under multiple objectives whose preference ordering varies based on context. The agent may encounter multiple contexts during its course of operation, each imposing a distinct lexicographic…

Artificial Intelligence · Computer Science 2025-11-06 Pulkit Rustagi , Yashwanthi Anand , Sandhya Saisubramanian

Reinforcement Learning with History-Dependent Dynamic Contexts

We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a novel reinforcement learning framework for history-dependent environments that generalizes the contextual MDP framework to handle non-Markov environments, where contexts…

Machine Learning · Computer Science 2023-05-19 Guy Tennenholtz , Nadav Merlis , Lior Shani , Martin Mladenov , Craig Boutilier

Prospective Side Information for Latent MDPs

In many interactive decision-making settings, there is latent and unobserved information that remains fixed. Consider, for example, a dialogue system, where complete information about a user, such as the user's preferences, is not given. In…

Machine Learning · Computer Science 2023-10-12 Jeongyeol Kwon , Yonathan Efroni , Shie Mannor , Constantine Caramanis

Configurable Markov Decision Processes

In many real-world problems, there is the possibility to configure, to a limited extent, some environmental parameters to improve the performance of a learning agent. In this paper, we propose a novel framework, Configurable Markov Decision…

Artificial Intelligence · Computer Science 2018-06-15 Alberto Maria Metelli , Mirco Mutti , Marcello Restelli

Finite-Horizon Markov Decision Processes with Sequentially-Observed Transitions

Markov Decision Processes (MDPs) have been used to formulate many decision-making problems in science and engineering. The objective is to synthesize the best decision (action selection) policies to maximize expected rewards (or minimize…

Optimization and Control · Mathematics 2015-07-07 Mahmoud El Chamie , Behcet Acikmese

Finite-Horizon Markov Decision Processes with State Constraints

Markov Decision Processes (MDPs) have been used to formulate many decision-making problems in science and engineering. The objective is to synthesize the best decision (action selection) policies to maximize expected rewards (minimize…

Optimization and Control · Mathematics 2015-07-08 Mahmoud El Chamie , Behcet Acikmese

Markov Decision Processes with Continuous Side Information

We consider a reinforcement learning (RL) setting in which the agent interacts with a sequence of episodic MDPs. At the start of each episode the agent has access to some side-information or context that determines the dynamics of the MDP…

Machine Learning · Statistics 2019-10-24 Aditya Modi , Nan Jiang , Satinder Singh , Ambuj Tewari

Relax but stay in control: from value to algorithms for online Markov decision processes

Online learning algorithms are designed to perform in non-stationary environments, but generally there is no notion of a dynamic state to model constraints on current and future actions as a function of past actions. State-based models are…

Machine Learning · Computer Science 2015-09-01 Peng Guan , Maxim Raginsky , Rebecca Willett

A Minimax-MDP Framework with Future-imposed Conditions for Learning-augmented Problems

We study a class of sequential decision-making problems with augmented predictions, potentially provided by a machine learning algorithm. In this setting, the decision-maker receives prediction intervals for unknown parameters that become…

Machine Learning · Computer Science 2025-05-05 Xin Chen , Yuze Chen , Yuan Zhou

Sample Complexity Characterization for Linear Contextual MDPs

Contextual Markov decision processes (CMDPs) describe a class of reinforcement learning problems in which the transition kernels and reward functions can change over time with different MDPs indexed by a context variable. While CMDPs serve…

Machine Learning · Computer Science 2024-02-06 Junze Deng , Yuan Cheng , Shaofeng Zou , Yingbin Liang

Quantile Markov Decision Process

The goal of a traditional Markov decision process (MDP) is to maximize expected cumulative reward over a defined horizon (possibly infinite). In many applications, however, a decision maker may be interested in optimizing a specific…

Artificial Intelligence · Computer Science 2025-10-16 Xiaocheng Li , Huaiyang Zhong , Margaret L. Brandeau

Learning Constrained Markov Decision Processes With Non-stationary Rewards and Constraints

In constrained Markov decision processes (CMDPs) with adversarial rewards and constraints, a well-known impossibility result prevents any algorithm from attaining both sublinear regret and sublinear constraint violation, when competing…

Machine Learning · Computer Science 2024-09-27 Francesco Emanuele Stradi , Anna Lunghi , Matteo Castiglioni , Alberto Marchesi , Nicola Gatti

Learning Adversarial MDPs with Stochastic Hard Constraints

We study online learning in constrained Markov decision processes (CMDPs) with adversarial losses and stochastic hard constraints, under bandit feedback. We consider three scenarios. In the first one, we address general CMDPs, where we…

Machine Learning · Computer Science 2025-02-10 Francesco Emanuele Stradi , Matteo Castiglioni , Alberto Marchesi , Nicola Gatti

Act as You Learn: Adaptive Decision-Making in Non-Stationary Markov Decision Processes

A fundamental (and largely open) challenge in sequential decision-making is dealing with non-stationary environments, where exogenous environmental conditions change over time. Such problems are traditionally modeled as non-stationary…

Artificial Intelligence · Computer Science 2024-01-23 Baiting Luo , Yunuo Zhang , Abhishek Dubey , Ayan Mukhopadhyay

Block Contextual MDPs for Continual Learning

In reinforcement learning (RL), when defining a Markov Decision Process (MDP), the environment dynamics is implicitly assumed to be stationary. This assumption of stationarity, while simplifying, can be unrealistic in many scenarios. In the…

Machine Learning · Computer Science 2021-10-15 Shagun Sodhani , Franziska Meier , Joelle Pineau , Amy Zhang

Hierarchical Decision-Making under Uncertainty: A Hybrid MDP and Chance-Constrained MPC Approach

This paper presents a hierarchical decision-making framework for autonomous systems operating under uncertainty, demonstrated through autonomous driving as a representative application. Surrounding agents are modeled using Hybrid Markov…

Systems and Control · Electrical Eng. & Systems 2026-03-19 Siyuan Li , Chengyuan Liu , Wen-Hua Chen

Continuous-Time Distributed Dynamic Programming for Networked Multi-Agent Markov Decision Processes

The main goal of this paper is to investigate continuous-time distributed dynamic programming (DP) algorithms for networked multi-agent Markov decision problems (MAMDPs). In our study, we adopt a distributed multi-agent framework where…

Systems and Control · Electrical Eng. & Systems 2024-06-14 Donghwan Lee , Han-Dong Lim , Do Wan Kim

Multi-Objective Approaches to Markov Decision Processes with Uncertain Transition Parameters

Markov decision processes (MDPs) are a popular model for performance analysis and optimization of stochastic systems. The parameters of stochastic behavior of MDPs are estimates from empirical observations of a system; their values are not…

Artificial Intelligence · Computer Science 2017-10-26 Dimitri Scheftelowitsch , Peter Buchholz , Vahid Hashemi , Holger Hermanns

Learning Algorithms for Verification of Markov Decision Processes

We present a general framework for applying learning algorithms and heuristical guidance to the verification of Markov decision processes (MDPs). The primary goal of our techniques is to improve performance by avoiding an exhaustive…

Systems and Control · Electrical Eng. & Systems 2025-04-02 Tomáš Brázdil , Krishnendu Chatterjee , Martin Chmelik , Vojtěch Forejt , Jan Křetínský , Marta Kwiatkowska , Tobias Meggendorfer , David Parker , Mateusz Ujma