Related papers: Sample-Efficient Constrained Reinforcement Learnin…

Last-Iterate Convergence of General Parameterized Policies in Constrained MDPs

This paper focuses on learning a Constrained Markov Decision Process (CMDP) via general parameterized policies. We propose a Primal-Dual based Regularized Accelerated Natural Policy Gradient (PDR-ANPG) algorithm that uses entropy and…

Machine Learning · Computer Science 2026-05-04 Washim Uddin Mondal , Vaneet Aggarwal

Improved Sample Complexity Analysis of Natural Policy Gradient Algorithm with General Parameterization for Infinite Horizon Discounted Reward Markov Decision Processes

We consider the problem of designing sample efficient learning algorithms for infinite horizon discounted reward Markov Decision Process. Specifically, we propose the Accelerated Natural Policy Gradient (ANPG) algorithm that utilizes an…

Machine Learning · Computer Science 2024-02-06 Washim Uddin Mondal , Vaneet Aggarwal

Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm

We consider the problem of constrained Markov decision process (CMDP) in continuous state-actions spaces where the goal is to maximize the expected cumulative reward subject to some constraints. We propose a novel Conservative Natural…

Machine Learning · Computer Science 2024-05-20 Qinbo Bai , Amrit Singh Bedi , Vaneet Aggarwal

Faster Algorithm and Sharper Analysis for Constrained Markov Decision Process

The problem of constrained Markov decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints on its utilities/costs. A new primal-dual approach is…

Optimization and Control · Mathematics 2021-10-22 Tianjiao Li , Ziwei Guan , Shaofeng Zou , Tengyu Xu , Yingbin Liang , Guanghui Lan

Achieving Instance-dependent Sample Complexity for Constrained Markov Decision Process

We consider the reinforcement learning problem for the constrained Markov decision process (CMDP), which plays a central role in satisfying safety or resource constraints in sequential learning and decision-making. In this problem, we are…

Machine Learning · Computer Science 2025-11-19 Jiashuo Jiang , Yinyu Ye

Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective Reinforcement Learning

We study policy optimization for Markov decision processes (MDPs) with multiple reward value functions, which are to be jointly optimized according to given criteria such as proportional fairness (smooth concave scalarization), hard…

Machine Learning · Computer Science 2022-10-19 Ruida Zhou , Tao Liu , Dileep Kalathil , P. R. Kumar , Chao Tian

Accelerated Primal-Dual Policy Optimization for Safe Reinforcement Learning

Constrained Markov Decision Process (CMDP) is a natural framework for reinforcement learning tasks with safety constraints, where agents learn a policy that maximizes the long-term reward while satisfying the constraints on the long-term…

Artificial Intelligence · Computer Science 2018-02-20 Qingkai Liang , Fanyu Que , Eytan Modiano

Convergence and sample complexity of natural policy gradient primal-dual methods for constrained MDPs

We study the sequential decision making problem of maximizing the expected total reward while satisfying a constraint on the expected total utility. We employ the natural policy gradient method to solve the discounted infinite-horizon…

Optimization and Control · Mathematics 2025-10-16 Dongsheng Ding , Kaiqing Zhang , Jiali Duan , Tamer Başar , Mihailo R. Jovanović

Approximate Constrained Discounted Dynamic Programming with Uniform Feasibility and Optimality

We consider a dynamic programming (DP) approach to approximately solving an infinite-horizon constrained Markov decision process (CMDP) problem with a fixed initial-state for the expected total discounted-reward criterion with a…

Optimization and Control · Mathematics 2023-08-08 Hyeong Soo Chang

Primal-Dual Sample Complexity Bounds for Constrained Markov Decision Processes with Multiple Constraints

This paper addresses the challenge of solving Constrained Markov Decision Processes (CMDPs) with $d > 1$ constraints when the transition dynamics are unknown, but samples can be drawn from a generative model. We propose a model-based…

Machine Learning · Computer Science 2025-03-11 Max Buckley , Konstantinos Papathanasiou , Andreas Spanopoulos

Global Convergence of Average Reward Constrained MDPs with Neural Critic and General Policy Parameterization

We study infinite-horizon Constrained Markov Decision Processes (CMDPs) with general policy parameterizations and multi-layer neural network critics. Existing theoretical analyses for constrained reinforcement learning largely rely on…

Machine Learning · Computer Science 2026-03-10 Anirudh Satheesh , Pankaj Kumar Barman , Washim Uddin Mondal , Vaneet Aggarwal

Markov Decision Processes with Long-Term Average Constraints

We consider the problem of constrained Markov Decision Process (CMDP) where an agent interacts with a unichain Markov Decision Process. At every interaction, the agent obtains a reward. Further, there are $K$ cost functions. The agent aims…

Machine Learning · Computer Science 2022-06-22 Mridul Agarwal , Qinbo Bai , Vaneet Aggarwal

Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Primal-Dual Approach

Reinforcement learning is widely used in applications where one needs to perform sequential decisions while interacting with the environment. The problem becomes more challenging when the decision requirement includes satisfying some safety…

Machine Learning · Computer Science 2022-07-15 Qinbo Bai , Amrit Singh Bedi , Mridul Agarwal , Alec Koppel , Vaneet Aggarwal

Learning General Parameterized Policies for Infinite Horizon Average Reward Constrained MDPs via Primal-Dual Policy Gradient Algorithm

This paper explores the realm of infinite horizon average reward Constrained Markov Decision Processes (CMDPs). To the best of our knowledge, this work is the first to delve into the regret and constraint violation analysis of average…

Machine Learning · Computer Science 2024-10-31 Qinbo Bai , Washim Uddin Mondal , Vaneet Aggarwal

Risk-Constrained Reinforcement Learning with Percentile Risk Criteria

In many sequential decision-making problems one is interested in minimizing an expected cumulative cost while taking into account \emph{risk}, i.e., increased awareness of events of small probability and high consequences. Accordingly, the…

Artificial Intelligence · Computer Science 2017-04-07 Yinlam Chow , Mohammad Ghavamzadeh , Lucas Janson , Marco Pavone

Constrained Reinforcement Learning with Average Reward Objective: Model-Based and Model-Free Algorithms

Reinforcement Learning (RL) serves as a versatile framework for sequential decision-making, finding applications across diverse domains such as robotics, autonomous driving, recommendation systems, supply chain optimization, biology,…

Machine Learning · Computer Science 2024-08-26 Vaneet Aggarwal , Washim Uddin Mondal , Qinbo Bai

Near-Optimal Sample Complexity Bounds for Constrained Average-Reward MDPs

Recent advances have significantly improved our understanding of the sample complexity of learning in average-reward Markov decision processes (AMDPs) under the generative model. However, much less is known about the constrained…

Machine Learning · Computer Science 2025-09-23 Yukuan Wei , Xudong Li , Lin F. Yang

Towards Painless Policy Optimization for Constrained MDPs

We study policy optimization in an infinite horizon, $\gamma$-discounted constrained Markov decision process (CMDP). Our objective is to return a policy that achieves large expected reward with a small constraint violation. We consider the…

Machine Learning · Computer Science 2022-04-12 Arushi Jain , Sharan Vaswani , Reza Babanezhad , Csaba Szepesvari , Doina Precup

Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity

In this paper we consider the problem of learning an $\epsilon$-optimal policy for a discounted Markov Decision Process (MDP). Given an MDP with $S$ states, $A$ actions, the discount factor $\gamma \in (0,1)$, and an approximation threshold…

Machine Learning · Computer Science 2020-12-25 Zihan Zhang , Yuan Zhou , Xiangyang Ji

Anytime-Constrained Reinforcement Learning

We introduce and study constrained Markov Decision Processes (cMDPs) with anytime constraints. An anytime constraint requires the agent to never violate its budget at any point in time, almost surely. Although Markovian policies are no…

Machine Learning · Computer Science 2024-06-14 Jeremy McMahan , Xiaojin Zhu