English
Related papers

Related papers: Sample-Efficient Constrained Reinforcement Learnin…

200 papers

This paper focuses on learning a Constrained Markov Decision Process (CMDP) via general parameterized policies. We propose a Primal-Dual based Regularized Accelerated Natural Policy Gradient (PDR-ANPG) algorithm that uses entropy and…

Machine Learning · Computer Science 2026-05-04 Washim Uddin Mondal , Vaneet Aggarwal

We consider the problem of designing sample efficient learning algorithms for infinite horizon discounted reward Markov Decision Process. Specifically, we propose the Accelerated Natural Policy Gradient (ANPG) algorithm that utilizes an…

Machine Learning · Computer Science 2024-02-06 Washim Uddin Mondal , Vaneet Aggarwal

We consider the problem of constrained Markov decision process (CMDP) in continuous state-actions spaces where the goal is to maximize the expected cumulative reward subject to some constraints. We propose a novel Conservative Natural…

Machine Learning · Computer Science 2024-05-20 Qinbo Bai , Amrit Singh Bedi , Vaneet Aggarwal

The problem of constrained Markov decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints on its utilities/costs. A new primal-dual approach is…

Optimization and Control · Mathematics 2021-10-22 Tianjiao Li , Ziwei Guan , Shaofeng Zou , Tengyu Xu , Yingbin Liang , Guanghui Lan

We consider the reinforcement learning problem for the constrained Markov decision process (CMDP), which plays a central role in satisfying safety or resource constraints in sequential learning and decision-making. In this problem, we are…

Machine Learning · Computer Science 2025-11-19 Jiashuo Jiang , Yinyu Ye

We study policy optimization for Markov decision processes (MDPs) with multiple reward value functions, which are to be jointly optimized according to given criteria such as proportional fairness (smooth concave scalarization), hard…

Machine Learning · Computer Science 2022-10-19 Ruida Zhou , Tao Liu , Dileep Kalathil , P. R. Kumar , Chao Tian

Constrained Markov Decision Process (CMDP) is a natural framework for reinforcement learning tasks with safety constraints, where agents learn a policy that maximizes the long-term reward while satisfying the constraints on the long-term…

Artificial Intelligence · Computer Science 2018-02-20 Qingkai Liang , Fanyu Que , Eytan Modiano

We study the sequential decision making problem of maximizing the expected total reward while satisfying a constraint on the expected total utility. We employ the natural policy gradient method to solve the discounted infinite-horizon…

Optimization and Control · Mathematics 2025-10-16 Dongsheng Ding , Kaiqing Zhang , Jiali Duan , Tamer Başar , Mihailo R. Jovanović

We consider a dynamic programming (DP) approach to approximately solving an infinite-horizon constrained Markov decision process (CMDP) problem with a fixed initial-state for the expected total discounted-reward criterion with a…

Optimization and Control · Mathematics 2023-08-08 Hyeong Soo Chang

This paper addresses the challenge of solving Constrained Markov Decision Processes (CMDPs) with $d > 1$ constraints when the transition dynamics are unknown, but samples can be drawn from a generative model. We propose a model-based…

Machine Learning · Computer Science 2025-03-11 Max Buckley , Konstantinos Papathanasiou , Andreas Spanopoulos

We study infinite-horizon Constrained Markov Decision Processes (CMDPs) with general policy parameterizations and multi-layer neural network critics. Existing theoretical analyses for constrained reinforcement learning largely rely on…

Machine Learning · Computer Science 2026-03-10 Anirudh Satheesh , Pankaj Kumar Barman , Washim Uddin Mondal , Vaneet Aggarwal

We consider the problem of constrained Markov Decision Process (CMDP) where an agent interacts with a unichain Markov Decision Process. At every interaction, the agent obtains a reward. Further, there are $K$ cost functions. The agent aims…

Machine Learning · Computer Science 2022-06-22 Mridul Agarwal , Qinbo Bai , Vaneet Aggarwal

Reinforcement learning is widely used in applications where one needs to perform sequential decisions while interacting with the environment. The problem becomes more challenging when the decision requirement includes satisfying some safety…

Machine Learning · Computer Science 2022-07-15 Qinbo Bai , Amrit Singh Bedi , Mridul Agarwal , Alec Koppel , Vaneet Aggarwal

This paper explores the realm of infinite horizon average reward Constrained Markov Decision Processes (CMDPs). To the best of our knowledge, this work is the first to delve into the regret and constraint violation analysis of average…

Machine Learning · Computer Science 2024-10-31 Qinbo Bai , Washim Uddin Mondal , Vaneet Aggarwal

In many sequential decision-making problems one is interested in minimizing an expected cumulative cost while taking into account \emph{risk}, i.e., increased awareness of events of small probability and high consequences. Accordingly, the…

Artificial Intelligence · Computer Science 2017-04-07 Yinlam Chow , Mohammad Ghavamzadeh , Lucas Janson , Marco Pavone

Reinforcement Learning (RL) serves as a versatile framework for sequential decision-making, finding applications across diverse domains such as robotics, autonomous driving, recommendation systems, supply chain optimization, biology,…

Machine Learning · Computer Science 2024-08-26 Vaneet Aggarwal , Washim Uddin Mondal , Qinbo Bai

Recent advances have significantly improved our understanding of the sample complexity of learning in average-reward Markov decision processes (AMDPs) under the generative model. However, much less is known about the constrained…

Machine Learning · Computer Science 2025-09-23 Yukuan Wei , Xudong Li , Lin F. Yang

We study policy optimization in an infinite horizon, $\gamma$-discounted constrained Markov decision process (CMDP). Our objective is to return a policy that achieves large expected reward with a small constraint violation. We consider the…

Machine Learning · Computer Science 2022-04-12 Arushi Jain , Sharan Vaswani , Reza Babanezhad , Csaba Szepesvari , Doina Precup

In this paper we consider the problem of learning an $\epsilon$-optimal policy for a discounted Markov Decision Process (MDP). Given an MDP with $S$ states, $A$ actions, the discount factor $\gamma \in (0,1)$, and an approximation threshold…

Machine Learning · Computer Science 2020-12-25 Zihan Zhang , Yuan Zhou , Xiangyang Ji

We introduce and study constrained Markov Decision Processes (cMDPs) with anytime constraints. An anytime constraint requires the agent to never violate its budget at any point in time, almost surely. Although Markovian policies are no…

Machine Learning · Computer Science 2024-06-14 Jeremy McMahan , Xiaojin Zhu
‹ Prev 1 2 3 10 Next ›