Related papers: Optimistic Planning by Regularized Dynamic Program…

Optimistically Optimistic Exploration for Provably Efficient Infinite-Horizon Reinforcement and Imitation Learning

We study the problem of reinforcement learning in infinite-horizon discounted linear Markov decision processes (MDPs), and propose the first computationally efficient algorithm achieving rate-optimal regret guarantees in this setting. Our…

Machine Learning · Computer Science 2026-03-16 Antoine Moulin , Gergely Neu , Luca Viano

Predictable Interval MDPs through Entropy Regularization

Regularization of control policies using entropy can be instrumental in adjusting predictability of real-world systems. Applications benefiting from such approaches range from, e.g., cybersecurity, which aims at maximal unpredictability, to…

Systems and Control · Electrical Eng. & Systems 2026-02-18 Menno van Zutphen , Giannis Delimpaltadakis , Maurice Heemels , Duarte Antunes

Approximate Dynamic Programming By Minimizing Distributionally Robust Bounds

Approximate dynamic programming is a popular method for solving large Markov decision processes. This paper describes a new class of approximate dynamic programming (ADP) methods- distributionally robust ADP-that address the curse of…

Machine Learning · Statistics 2012-05-22 Marek Petrik

Computing monotone policies for Markov decision processes: a nearly-isotonic penalty approach

This paper discusses algorithms for solving Markov decision processes (MDPs) that have monotone optimal policies. We propose a two-stage alternating convex optimization scheme that can accelerate the search for an optimal policy by…

Systems and Control · Computer Science 2017-04-04 Robert Mattila , Cristian R. Rojas , Vikram Krishnamurthy , Bo Wahlberg

Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes

Markov decision processes (MDPs) are the defacto frame-work for sequential decision making in the presence ofstochastic uncertainty. A classical optimization criterion forMDPs is to maximize the expected discounted-sum pay-off, which…

Artificial Intelligence · Computer Science 2020-02-28 Tomas Brazdil , Krishnendu Chatterjee , Petr Novotny , Jiri Vahala

A Regularized Approach to Sparse Optimal Policy in Reinforcement Learning

We propose and study a general framework for regularized Markov decision processes (MDPs) where the goal is to find an optimal policy that maximizes the expected discounted total reward plus a policy regularization term. The extant…

Machine Learning · Statistics 2019-10-22 Xiang Li , Wenhao Yang , Zhihua Zhang

Rollout-Based Approximate Dynamic Programming for MDPs with Information-Theoretic Constraints

This paper studies a finite-horizon Markov decision problem with information-theoretic constraints, where the goal is to minimize directed information from the controlled source process to the control process, subject to stage-wise cost…

Systems and Control · Electrical Eng. & Systems 2025-09-04 Zixuan He , Charalambos D. Charalambous , Photios A. Stavrou

Minimizing the Outage Probability in a Markov Decision Process

Standard Markov decision process (MDP) and reinforcement learning algorithms optimize the policy with respect to the expected gain. We propose an algorithm which enables to optimize an alternative objective: the probability that the gain is…

Machine Learning · Computer Science 2023-03-06 Vincent Corlay , Jean-Christophe Sibel

Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation

We develop several new algorithms for learning Markov Decision Processes in an infinite-horizon average-reward setting with linear function approximation. Using the optimism principle and assuming that the MDP has a linear structure, we…

Machine Learning · Computer Science 2021-04-27 Chen-Yu Wei , Mehdi Jafarnia-Jahromi , Haipeng Luo , Rahul Jain

Approximate Constrained Discounted Dynamic Programming with Uniform Feasibility and Optimality

We consider a dynamic programming (DP) approach to approximately solving an infinite-horizon constrained Markov decision process (CMDP) problem with a fixed initial-state for the expected total discounted-reward criterion with a…

Optimization and Control · Mathematics 2023-08-08 Hyeong Soo Chang

Finite-State Approximations to Discounted and Average Cost Constrained Markov Decision Processes

In this paper, we consider the finite-state approximation of a discrete-time constrained Markov decision process (MDP) under the discounted and average cost criteria. Using the linear programming formulation of the constrained discounted…

Optimization and Control · Mathematics 2018-07-10 Naci Saldi

A Dynamic Programming Approach to Evaluating Multivariate Gaussian Probabilities

We propose a method of approximating multivariate Gaussian probabilities using dynamic programming. We show that solving the optimization problem associated with a class of discrete-time finite horizon Markov decision processes with…

Optimization and Control · Mathematics 2018-02-08 Morgan Jones , Matthew M. Peet

Beyond Average Return in Markov Decision Processes

What are the functionals of the reward that can be computed and optimized exactly in Markov Decision Processes?In the finite-horizon, undiscounted setting, Dynamic Programming (DP) can only handle these operations efficiently for certain…

Artificial Intelligence · Computer Science 2024-02-20 Alexandre Marthe , Aurélien Garivier , Claire Vernade

Approximate Value Iteration for Risk-aware Markov Decision Processes

We consider large-scale Markov decision processes (MDPs) with a risk measure of variability in cost, under the risk-aware MDPs paradigm. Previous studies showed that risk-aware MDPs, based on a minimax approach to handling risk, can be…

Systems and Control · Computer Science 2017-05-17 Pengqian Yu , William B. Haskell , Huan Xu

On Supervised On-line Rolling-Horizon Control for Infinite-Horizon Discounted Markov Decision Processes

This note re-visits the rolling-horizon control approach to the problem of a Markov decision process (MDP) with infinite-horizon discounted expected reward criterion. Distinguished from the classical value-iteration approach, we develop an…

Optimization and Control · Mathematics 2022-06-07 Hyeong Soo Chang

Finding the Near Optimal Policy via Adaptive Reduced Regularization in MDPs

Regularized MDPs serve as a smooth version of original MDPs. However, biased optimal policy always exists for regularized MDPs. Instead of making the coefficient{\lambda}of regularized term sufficiently small, we propose an adaptive…

Machine Learning · Computer Science 2020-11-03 Wenhao Yang , Xiang Li , Guangzeng Xie , Zhihua Zhang

Regularized Decomposition of High-Dimensional Multistage Stochastic Programs with Markov Uncertainty

We develop a quadratic regularization approach for the solution of high-dimensional multistage stochastic optimization problems characterized by a potentially large number of time periods/stages (e.g. hundreds), a high-dimensional resource…

Optimization and Control · Mathematics 2017-02-28 Tsvetan Asamov , Warren B. Powell

Twice Regularized Markov Decision Processes: The Equivalence between Robustness and Regularization

Robust Markov decision processes (MDPs) aim to handle changing or partially known system dynamics. To solve them, one typically resorts to robust optimization methods. However, this significantly increases computational complexity and…

Machine Learning · Computer Science 2023-03-14 Esther Derman , Yevgeniy Men , Matthieu Geist , Shie Mannor

Finite-Horizon Markov Decision Processes with State Constraints

Markov Decision Processes (MDPs) have been used to formulate many decision-making problems in science and engineering. The objective is to synthesize the best decision (action selection) policies to maximize expected rewards (minimize…

Optimization and Control · Mathematics 2015-07-08 Mahmoud El Chamie , Behcet Acikmese

Relaxed Equilibria for Time-Inconsistent Markov Decision Processes

This paper considers an infinite-horizon Markov decision process (MDP) that allows for general non-exponential discount functions, in both discrete and continuous time. Due to the inherent time inconsistency, we look for a randomized…

Optimization and Control · Mathematics 2024-12-10 Erhan Bayraktar , Yu-Jui Huang , Zhenhua Wang , Zhou Zhou