Related papers: Anytime-Constrained Reinforcement Learning

Efficient Policy Optimization in Robust Constrained MDPs with Iteration Complexity Guarantees

Constrained decision-making is essential for designing safe policies in real-world control systems, yet simulated environments often fail to capture real-world adversities. We consider the problem of learning a policy that will maximize the…

Machine Learning · Computer Science 2026-02-10 Sourav Ganguly , Kishan Panaganti , Arnob Ghosh , Adam Wierman

Discounted continuous-time constrained Markov decision processes in Polish spaces

This paper is devoted to studying constrained continuous-time Markov decision processes (MDPs) in the class of randomized policies depending on state histories. The transition rates may be unbounded, the reward and costs are admitted to be…

Probability · Mathematics 2012-01-04 Xianping Guo , Xinyuan Song

Polynomial-Time Approximability of Constrained Reinforcement Learning

We study the computational complexity of approximating general constrained Markov decision processes. Our primary contribution is the design of a polynomial time $(0,\epsilon)$-additive bicriteria approximation algorithm for finding optimal…

Data Structures and Algorithms · Computer Science 2025-02-12 Jeremy McMahan

Reinforcement Learning of Markov Decision Processes with Peak Constraints

In this paper, we consider reinforcement learning of Markov Decision Processes (MDP) with peak constraints, where an agent chooses a policy to optimize an objective and at the same time satisfy additional constraints. The agent has to take…

Optimization and Control · Mathematics 2019-12-09 Ather Gattami

Probably Approximately Correct MDP Learning and Control With Temporal Logic Constraints

We consider synthesis of control policies that maximize the probability of satisfying given temporal logic specifications in unknown, stochastic environments. We model the interaction between the system and its environment as a Markov…

Systems and Control · Computer Science 2014-05-01 Jie Fu , Ufuk Topcu

Finite-State Approximations to Discounted and Average Cost Constrained Markov Decision Processes

In this paper, we consider the finite-state approximation of a discrete-time constrained Markov decision process (MDP) under the discounted and average cost criteria. Using the linear programming formulation of the constrained discounted…

Optimization and Control · Mathematics 2018-07-10 Naci Saldi

Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes

Markov decision processes (MDPs) are the defacto frame-work for sequential decision making in the presence ofstochastic uncertainty. A classical optimization criterion forMDPs is to maximize the expected discounted-sum pay-off, which…

Artificial Intelligence · Computer Science 2020-02-28 Tomas Brazdil , Krishnendu Chatterjee , Petr Novotny , Jiri Vahala

Anytime-Competitive Reinforcement Learning with Policy Prior

This paper studies the problem of Anytime-Competitive Markov Decision Process (A-CMDP). Existing works on Constrained Markov Decision Processes (CMDPs) aim to optimize the expected reward while constraining the expected cost over random…

Machine Learning · Computer Science 2024-02-06 Jianyi Yang , Pengfei Li , Tongxin Li , Adam Wierman , Shaolei Ren

Provably Efficient Sample Complexity for Robust CMDP

We study the problem of learning policies that maximize cumulative reward while satisfying safety constraints, even when the real environment differs from a simulator or nominal model. We focus on robust constrained Markov decision…

Machine Learning · Computer Science 2025-11-12 Sourav Ganguly , Arnob Ghosh

Convex Approximations of Random Constrained Markov Decision Processes

Constrained Markov decision processes (CMDPs) are used as a decision-making framework to study the long-run performance of a stochastic system. It is well-known that a stationary optimal policy of a CMDP problem under discounted cost…

Optimization and Control · Mathematics 2025-06-02 V Varagapriya , Vikas Vikram Singh , Abdel Lisser

A Fully Polynomial Time Approximation Scheme for Constrained MDPs and Stochastic Shortest Path under Local Transitions

The fixed-horizon constrained Markov Decision Process (C-MDP) is a well-known model for planning in stochastic environments under operating constraints. Chance-Constrained MDP (CC-MDP) is a variant that allows bounding the probability of…

Artificial Intelligence · Computer Science 2023-04-19 Majid Khonji

On Minimizing Total Discounted Cost in MDPs Subject to Reachability Constraints

We study the synthesis of a policy in a Markov decision process (MDP) following which an agent reaches a target state in the MDP while minimizing its total discounted cost. The problem combines a reachability criterion with a discounted…

Optimization and Control · Mathematics 2021-03-18 Yagiz Savas , Christos K. Verginis , Michael Hibbard , Ufuk Topcu

Constrained and Robust Policy Synthesis with Satisfiability-Modulo-Probabilistic-Model-Checking

The ability to compute reward-optimal policies for given and known finite Markov decision processes (MDPs) underpins a variety of applications across planning, controller synthesis, and verification. However, we often want policies (1) to…

Logic in Computer Science · Computer Science 2025-11-18 Linus Heck , Filip Macák , Milan Češka , Sebastian Junges

Reconnaissance and Planning algorithm for constrained MDP

Practical reinforcement learning problems are often formulated as constrained Markov decision process (CMDP) problems, in which the agent has to maximize the expected return while satisfying a set of prescribed safety constraints. In this…

Machine Learning · Computer Science 2019-09-23 Shin-ichi Maeda , Hayato Watahiki , Shintarou Okada , Masanori Koyama

Robust Deterministic Policies for Markov Decision Processes under Budgeted Uncertainty

This paper studies the computation of robust deterministic policies for Markov Decision Processes (MDPs) in the Lightning Does Not Strike Twice (LDST) model of Mannor, Mebel and Xu (ICML '12). In this model, designed to provide robustness…

Optimization and Control · Mathematics 2024-12-18 Fei Wu , Erik Demeulemeester , Jannik Matuschke

Approximate Constrained Discounted Dynamic Programming with Uniform Feasibility and Optimality

We consider a dynamic programming (DP) approach to approximately solving an infinite-horizon constrained Markov decision process (CMDP) problem with a fixed initial-state for the expected total discounted-reward criterion with a…

Optimization and Control · Mathematics 2023-08-08 Hyeong Soo Chang

Lower Bound On the Computational Complexity of Discounted Markov Decision Problems

We study the computational complexity of the infinite-horizon discounted-reward Markov Decision Problem (MDP) with a finite state space $|\mathcal{S}|$ and a finite action space $|\mathcal{A}|$. We show that any randomized algorithm needs a…

Computational Complexity · Computer Science 2017-05-24 Yichen Chen , Mengdi Wang

Achieving Instance-dependent Sample Complexity for Constrained Markov Decision Process

We consider the reinforcement learning problem for the constrained Markov decision process (CMDP), which plays a central role in satisfying safety or resource constraints in sequential learning and decision-making. In this problem, we are…

Machine Learning · Computer Science 2025-11-19 Jiashuo Jiang , Yinyu Ye

Entropy Maximization for Markov Decision Processes Under Temporal Logic Constraints

We study the problem of synthesizing a policy that maximizes the entropy of a Markov decision process (MDP) subject to a temporal logic constraint. Such a policy minimizes the predictability of the paths it generates, or dually, maximizes…

Optimization and Control · Mathematics 2019-06-17 Yagiz Savas , Melkior Ornik , Murat Cubuktepe , Mustafa O. Karabag , Ufuk Topcu

Policy Testing in Markov Decision Processes

We study the policy testing problem in discounted Markov decision processes (MDPs) in the fixed-confidence setting under a generative model with static sampling. The goal is to decide whether the value of a given policy exceeds a specified…

Machine Learning · Statistics 2026-04-21 Kaito Ariu , Po-An Wang , Alexandre Proutiere , Kenshi Abe