Related papers: Optimistic Planning by Regularized Dynamic Program…
We study the problem of reinforcement learning in infinite-horizon discounted linear Markov decision processes (MDPs), and propose the first computationally efficient algorithm achieving rate-optimal regret guarantees in this setting. Our…
Regularization of control policies using entropy can be instrumental in adjusting predictability of real-world systems. Applications benefiting from such approaches range from, e.g., cybersecurity, which aims at maximal unpredictability, to…
Approximate dynamic programming is a popular method for solving large Markov decision processes. This paper describes a new class of approximate dynamic programming (ADP) methods- distributionally robust ADP-that address the curse of…
This paper discusses algorithms for solving Markov decision processes (MDPs) that have monotone optimal policies. We propose a two-stage alternating convex optimization scheme that can accelerate the search for an optimal policy by…
Markov decision processes (MDPs) are the defacto frame-work for sequential decision making in the presence ofstochastic uncertainty. A classical optimization criterion forMDPs is to maximize the expected discounted-sum pay-off, which…
We propose and study a general framework for regularized Markov decision processes (MDPs) where the goal is to find an optimal policy that maximizes the expected discounted total reward plus a policy regularization term. The extant…
This paper studies a finite-horizon Markov decision problem with information-theoretic constraints, where the goal is to minimize directed information from the controlled source process to the control process, subject to stage-wise cost…
Standard Markov decision process (MDP) and reinforcement learning algorithms optimize the policy with respect to the expected gain. We propose an algorithm which enables to optimize an alternative objective: the probability that the gain is…
We develop several new algorithms for learning Markov Decision Processes in an infinite-horizon average-reward setting with linear function approximation. Using the optimism principle and assuming that the MDP has a linear structure, we…
We consider a dynamic programming (DP) approach to approximately solving an infinite-horizon constrained Markov decision process (CMDP) problem with a fixed initial-state for the expected total discounted-reward criterion with a…
In this paper, we consider the finite-state approximation of a discrete-time constrained Markov decision process (MDP) under the discounted and average cost criteria. Using the linear programming formulation of the constrained discounted…
We propose a method of approximating multivariate Gaussian probabilities using dynamic programming. We show that solving the optimization problem associated with a class of discrete-time finite horizon Markov decision processes with…
What are the functionals of the reward that can be computed and optimized exactly in Markov Decision Processes?In the finite-horizon, undiscounted setting, Dynamic Programming (DP) can only handle these operations efficiently for certain…
We consider large-scale Markov decision processes (MDPs) with a risk measure of variability in cost, under the risk-aware MDPs paradigm. Previous studies showed that risk-aware MDPs, based on a minimax approach to handling risk, can be…
This note re-visits the rolling-horizon control approach to the problem of a Markov decision process (MDP) with infinite-horizon discounted expected reward criterion. Distinguished from the classical value-iteration approach, we develop an…
Regularized MDPs serve as a smooth version of original MDPs. However, biased optimal policy always exists for regularized MDPs. Instead of making the coefficient{\lambda}of regularized term sufficiently small, we propose an adaptive…
We develop a quadratic regularization approach for the solution of high-dimensional multistage stochastic optimization problems characterized by a potentially large number of time periods/stages (e.g. hundreds), a high-dimensional resource…
Robust Markov decision processes (MDPs) aim to handle changing or partially known system dynamics. To solve them, one typically resorts to robust optimization methods. However, this significantly increases computational complexity and…
Markov Decision Processes (MDPs) have been used to formulate many decision-making problems in science and engineering. The objective is to synthesize the best decision (action selection) policies to maximize expected rewards (minimize…
This paper considers an infinite-horizon Markov decision process (MDP) that allows for general non-exponential discount functions, in both discrete and continuous time. Due to the inherent time inconsistency, we look for a randomized…