Related papers: Achieving Instance-dependent Sample Complexity for…

Near-Optimal Sample Complexity for Online Constrained MDPs

Safety is a fundamental challenge in reinforcement learning (RL), particularly in real-world applications such as autonomous driving, robotics, and healthcare. To address this, Constrained Markov Decision Processes (CMDPs) are commonly used…

Machine Learning · Computer Science 2026-02-18 Chang Liu , Yunfan Li , Lin F. Yang

Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss

We consider online learning for episodic stochastically constrained Markov decision processes (CMDPs), which plays a central role in ensuring the safety of reinforcement learning. Here the loss function can vary arbitrarily across the…

Machine Learning · Computer Science 2021-10-19 Shuang Qiu , Xiaohan Wei , Zhuoran Yang , Jieping Ye , Zhaoran Wang

Learning with Safety Constraints: Sample Complexity of Reinforcement Learning for Constrained MDPs

Many physical systems have underlying safety considerations that require that the policy employed ensures the satisfaction of a set of constraints. The analytical formulation usually takes the form of a Constrained Markov Decision Process…

Machine Learning · Computer Science 2021-03-03 Aria HasanzadeZonuzy , Archana Bura , Dileep Kalathil , Srinivas Shakkottai

Primal-Dual Sample Complexity Bounds for Constrained Markov Decision Processes with Multiple Constraints

This paper addresses the challenge of solving Constrained Markov Decision Processes (CMDPs) with $d > 1$ constraints when the transition dynamics are unknown, but samples can be drawn from a generative model. We propose a model-based…

Machine Learning · Computer Science 2025-03-11 Max Buckley , Konstantinos Papathanasiou , Andreas Spanopoulos

Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Primal-Dual Approach

Reinforcement learning is widely used in applications where one needs to perform sequential decisions while interacting with the environment. The problem becomes more challenging when the decision requirement includes satisfying some safety…

Machine Learning · Computer Science 2022-07-15 Qinbo Bai , Amrit Singh Bedi , Mridul Agarwal , Alec Koppel , Vaneet Aggarwal

Learning Adversarial MDPs with Stochastic Hard Constraints

We study online learning in constrained Markov decision processes (CMDPs) with adversarial losses and stochastic hard constraints, under bandit feedback. We consider three scenarios. In the first one, we address general CMDPs, where we…

Machine Learning · Computer Science 2025-02-10 Francesco Emanuele Stradi , Matteo Castiglioni , Alberto Marchesi , Nicola Gatti

A Near-Optimal Primal-Dual Method for Off-Policy Learning in CMDP

As an important framework for safe Reinforcement Learning, the Constrained Markov Decision Process (CMDP) has been extensively studied in the recent literature. However, despite the rich results under various on-policy learning settings,…

Machine Learning · Computer Science 2022-07-14 Fan Chen , Junyu Zhang , Zaiwen Wen

Learning Constrained Markov Decision Processes With Non-stationary Rewards and Constraints

In constrained Markov decision processes (CMDPs) with adversarial rewards and constraints, a well-known impossibility result prevents any algorithm from attaining both sublinear regret and sublinear constraint violation, when competing…

Machine Learning · Computer Science 2024-09-27 Francesco Emanuele Stradi , Anna Lunghi , Matteo Castiglioni , Alberto Marchesi , Nicola Gatti

Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model

The curse of dimensionality is a widely known issue in reinforcement learning (RL). In the tabular setting where the state space $\mathcal{S}$ and the action space $\mathcal{A}$ are both finite, to obtain a nearly optimal policy with…

Machine Learning · Computer Science 2022-10-28 Bingyan Wang , Yuling Yan , Jianqing Fan

A Primal-Dual Approach to Constrained Markov Decision Processes

In many operations management problems, we need to make decisions sequentially to minimize the cost while satisfying certain constraints. One modeling approach to study such problems is constrained Markov decision process (CMDP). When…

Optimization and Control · Mathematics 2021-01-27 Yi Chen , Jing Dong , Zhaoran Wang

Faster Algorithm and Sharper Analysis for Constrained Markov Decision Process

The problem of constrained Markov decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints on its utilities/costs. A new primal-dual approach is…

Optimization and Control · Mathematics 2021-10-22 Tianjiao Li , Ziwei Guan , Shaofeng Zou , Tengyu Xu , Yingbin Liang , Guanghui Lan

Provably Efficient Exploration in Constrained Reinforcement Learning:Posterior Sampling Is All You Need

We present a new algorithm based on posterior sampling for learning in constrained Markov decision processes (CMDP) in the infinite-horizon undiscounted setting. The algorithm achieves near-optimal regret bounds while being advantageous…

Machine Learning · Computer Science 2023-09-28 Danil Provodin , Pratik Gajane , Mykola Pechenizkiy , Maurits Kaptein

Sample-Efficient Constrained Reinforcement Learning with General Parameterization

We consider a constrained Markov Decision Problem (CMDP) where the goal of an agent is to maximize the expected discounted sum of rewards over an infinite horizon while ensuring that the expected discounted sum of costs exceeds a certain…

Machine Learning · Computer Science 2024-11-01 Washim Uddin Mondal , Vaneet Aggarwal

Efficient Policy Optimization in Robust Constrained MDPs with Iteration Complexity Guarantees

Constrained decision-making is essential for designing safe policies in real-world control systems, yet simulated environments often fail to capture real-world adversities. We consider the problem of learning a policy that will maximize the…

Machine Learning · Computer Science 2026-02-10 Sourav Ganguly , Kishan Panaganti , Arnob Ghosh , Adam Wierman

Model-Free Algorithm and Regret Analysis for MDPs with Long-Term Constraints

In the optimization of dynamical systems, the variables typically have constraints. Such problems can be modeled as a constrained Markov Decision Process (CMDP). This paper considers a model-free approach to the problem, where the…

Machine Learning · Computer Science 2021-02-02 Qinbo Bai , Vaneet Aggarwal , Ather Gattami

Efficient Exploration in Average-Reward Constrained Reinforcement Learning: Achieving Near-Optimal Regret With Posterior Sampling

We present a new algorithm based on posterior sampling for learning in Constrained Markov Decision Processes (CMDP) in the infinite-horizon undiscounted setting. The algorithm achieves near-optimal regret bounds while being advantageous…

Machine Learning · Computer Science 2024-05-30 Danil Provodin , Maurits Kaptein , Mykola Pechenizkiy

Problem Dependent Reinforcement Learning Bounds Which Can Identify Bandit Structure in MDPs

In order to make good decision under uncertainty an agent must learn from observations. To do so, two of the most common frameworks are Contextual Bandits and Markov Decision Processes (MDPs). In this paper, we study whether there exist…

Machine Learning · Computer Science 2019-11-05 Andrea Zanette , Emma Brunskill

Reconnaissance and Planning algorithm for constrained MDP

Practical reinforcement learning problems are often formulated as constrained Markov decision process (CMDP) problems, in which the agent has to maximize the expected return while satisfying a set of prescribed safety constraints. In this…

Machine Learning · Computer Science 2019-09-23 Shin-ichi Maeda , Hayato Watahiki , Shintarou Okada , Masanori Koyama

Anytime-Constrained Reinforcement Learning

We introduce and study constrained Markov Decision Processes (cMDPs) with anytime constraints. An anytime constraint requires the agent to never violate its budget at any point in time, almost surely. Although Markovian policies are no…

Machine Learning · Computer Science 2024-06-14 Jeremy McMahan , Xiaojin Zhu

Logarithmic regret bounds for continuous-time average-reward Markov decision processes

We consider reinforcement learning for continuous-time Markov decision processes (MDPs) in the infinite-horizon, average-reward setting. In contrast to discrete-time MDPs, a continuous-time process moves to a state and stays there for a…

Machine Learning · Computer Science 2024-07-03 Xuefeng Gao , Xun Yu Zhou