Related papers: Algorithm for Constrained Markov Decision Process …
The problem of constrained Markov decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints on its utilities/costs. A new primal-dual approach is…
We study entropy-regularized constrained Markov decision processes (CMDPs) under the soft-max parameterization, in which an agent aims to maximize the entropy-regularized value function while satisfying constraints on the expected total…
Entropy regularized Markov decision processes have been widely used in reinforcement learning. This paper is concerned with the primal-dual formulation of the entropy regularized problems. Standard first-order methods suffer from slow…
In many operations management problems, we need to make decisions sequentially to minimize the cost while satisfying certain constraints. One modeling approach to study such problems is constrained Markov decision process (CMDP). When…
Previous work has separately addressed different forms of action, state and action-state entropy regularization, pure exploration and space occupation. These problems have become extremely relevant for regularization, generalization,…
We propose a novel randomized linear programming algorithm for approximating the optimal policy of the discounted Markov decision problem. By leveraging the value-policy duality and binary-tree data structures, the algorithm adaptively…
A constrained Markov decision process (CMDP) approach is developed for response-adaptive procedures in clinical trials with binary outcomes. The resulting CMDP class of Bayesian response -- adaptive procedures can be used to target a…
Constrained decision-making is essential for designing safe policies in real-world control systems, yet simulated environments often fail to capture real-world adversities. We consider the problem of learning a policy that will maximize the…
We consider the problem of computing optimal policies in average-reward Markov decision processes. This classical problem can be formulated as a linear program directly amenable to saddle-point optimization methods, albeit with a number of…
We consider a constrained Markov Decision Problem (CMDP) where the goal of an agent is to maximize the expected discounted sum of rewards over an infinite horizon while ensuring that the expected discounted sum of costs exceeds a certain…
This note summarizes the optimization formulations used in the study of Markov decision processes. We consider both the discounted and undiscounted processes under the standard and the entropy-regularized settings. For each setting, we…
We introduce and study constrained Markov Decision Processes (cMDPs) with anytime constraints. An anytime constraint requires the agent to never violate its budget at any point in time, almost surely. Although Markovian policies are no…
The paper studies a distributed constrained optimization problem, where multiple agents connected in a network collectively minimize the sum of individual objective functions subject to a global constraint being an intersection of the local…
We consider the problem of designing policies for Markov decision processes (MDPs) with dynamic coherent risk objectives and constraints. We begin by formulating the problem in a Lagrangian framework. Under the assumption that the risk…
We consider the constrained optimal control problem for the gradual-impulsive CTMDP model with the performance criteria being the expected total undiscounted costs (from the running cost and the cost from each time an impulse being…
Markov Decision Processes (MDPs) have been used to formulate many decision-making problems in science and engineering. The objective is to synthesize the best decision (action selection) policies to maximize expected rewards (minimize…
We consider a discounted cost constrained Markov decision process (CMDP) policy optimization problem, in which an agent seeks to maximize a discounted cumulative reward subject to a number of constraints on discounted cumulative utilities.…
This work proposes an accelerated primal-dual dynamical system for affine constrained convex optimization and presents a class of primal-dual methods with nonergodic convergence rates. In continuous level, exponential decay of a novel…
We study the general approach to accelerating the convergence of the most widely used solution method of Markov decision processes with the total expected discounted reward. Inspired by the monotone behavior of the contraction mappings in…
Reinforcement learning is widely used in applications where one needs to perform sequential decisions while interacting with the environment. The problem becomes more challenging when the decision requirement includes satisfying some safety…