Related papers: Efficient Inference in Markov Control Problems

On Supervised On-line Rolling-Horizon Control for Infinite-Horizon Discounted Markov Decision Processes

This note re-visits the rolling-horizon control approach to the problem of a Markov decision process (MDP) with infinite-horizon discounted expected reward criterion. Distinguished from the classical value-iteration approach, we develop an…

Optimization and Control · Mathematics 2022-06-07 Hyeong Soo Chang

A policy gradient approach for Finite Horizon Constrained Markov Decision Processes

The infinite horizon setting is widely adopted for problems of reinforcement learning (RL). These invariably result in stationary policies that are optimal. In many situations, finite horizon control problems are of interest and for such…

Machine Learning · Computer Science 2025-03-21 Soumyajit Guin , Shalabh Bhatnagar

Optimal Continuous Time Markov Decisions

In the context of Markov decision processes running in continuous time, one of the most intriguing challenges is the efficient approximation of finite horizon reachability objectives. A multitude of sophisticated model checking algorithms…

Systems and Control · Computer Science 2015-08-03 Yuliya Butkova , Hassan Hatefi , Holger Hermanns , Jan Krcal

Markov Decision Processes of the Third Kind: Learning Distributions by Policy Gradient Descent

The goal of this paper is to analyze distributional Markov Decision Processes as a class of control problems in which the objective is to learn policies that steer the distribution of a cumulative reward toward a prescribed target law,…

Optimization and Control · Mathematics 2026-02-09 Nicole Bäuerle , Athanasios Vasileiadis

Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation

We develop several new algorithms for learning Markov Decision Processes in an infinite-horizon average-reward setting with linear function approximation. Using the optimism principle and assuming that the MDP has a linear structure, we…

Machine Learning · Computer Science 2021-04-27 Chen-Yu Wei , Mehdi Jafarnia-Jahromi , Haipeng Luo , Rahul Jain

Risk-Averse $\omega$-regular Markov Decision Process Control

Many control problems in environments that can be modeled as Markov decision processes (MDPs) concern infinite-time horizon specifications. The classical aim in this context is to compute a control policy that maximizes the probability of…

Systems and Control · Computer Science 2017-05-03 Ruediger Ehlers , Salar Moarref , Ufuk Topcu

Finite-Horizon Markov Decision Processes with Sequentially-Observed Transitions

Markov Decision Processes (MDPs) have been used to formulate many decision-making problems in science and engineering. The objective is to synthesize the best decision (action selection) policies to maximize expected rewards (or minimize…

Optimization and Control · Mathematics 2015-07-07 Mahmoud El Chamie , Behcet Acikmese

Linear convergence of a policy gradient method for some finite horizon continuous time control problems

Despite its popularity in the reinforcement learning community, a provably convergent policy gradient method for continuous space-time control problems with nonlinear state dynamics has been elusive. This paper proposes proximal gradient…

Optimization and Control · Mathematics 2022-12-27 Christoph Reisinger , Wolfgang Stockinger , Yufei Zhang

Landscape of Policy Optimization for Finite Horizon MDPs with General State and Action

Policy gradient methods are widely used in reinforcement learning. Yet, the nonconvexity of policy optimization poses significant challenges in understanding the global convergence of policy gradient methods. For a class of finite-horizon…

Optimization and Control · Mathematics 2026-03-10 Xin Chen , Yifan Hu , Minda Zhao

Policy Gradient Algorithms for Robust MDPs with Non-Rectangular Uncertainty Sets

We propose policy gradient algorithms for robust infinite-horizon Markov decision processes (MDPs) with non-rectangular uncertainty sets, thereby addressing an open challenge in the robust MDP literature. Indeed, uncertainty sets that…

Optimization and Control · Mathematics 2025-09-30 Mengmeng Li , Daniel Kuhn , Tobias Sutter

Coupling and a generalised Policy Iteration Algorithm in continuous time

We analyse a version of the policy iteration algorithm for the discounted infinite-horizon problem for controlled multidimensional diffusion processes, where both the drift and the diffusion coefficient can be controlled. We prove that,…

Probability · Mathematics 2017-07-26 Saul D. Jacka , Aleksandar Mijatovic , Dejan Siraj

Infinite horizon optimal control of forward-backward stochastic differential equations with delay

We consider a problem of optimal control of an infinite horizon system governed by forward-backward stochastic differential equations with delay. Sufficient and necessary maximum principles for optimal control under partial information in…

Optimization and Control · Mathematics 2013-12-09 Nacira Agram , Bernt Øksendal

Infinite Horizon Average Cost Dynamic Programming Subject to Total Variation Distance Ambiguity

We analyze the infinite horizon minimax average cost Markov Control Model (MCM), for a class of controlled process conditional distributions, which belong to a ball, with respect to total variation distance metric, centered at a known…

Optimization and Control · Mathematics 2015-12-22 Ioannis Tzortzis , Charalambos D. Charalambous , Themistoklis Charalambous

Deterministic Trajectory Optimization through Probabilistic Optimal Control

In this article, we discuss two algorithms tailored to discrete-time deterministic finite-horizon nonlinear optimal control problems or so-called deterministic trajectory optimization problems. Both algorithms can be derived from an…

Optimization and Control · Mathematics 2024-12-10 Mohammad Mahmoudi Filabadi , Tom Lefebvre , Guillaume Crevecoeur

Exponential Lower Bounds For Policy Iteration

We study policy iteration for infinite-horizon Markov decision processes. It has recently been shown policy iteration style algorithms have exponential lower bounds in a two player game setting. We extend these lower bounds to Markov…

Data Structures and Algorithms · Computer Science 2010-03-18 John Fearnley

Regret Analysis of Policy Gradient Algorithm for Infinite Horizon Average Reward Markov Decision Processes

In this paper, we consider an infinite horizon average reward Markov Decision Process (MDP). Distinguishing itself from existing works within this context, our approach harnesses the power of the general policy gradient-based algorithm,…

Machine Learning · Computer Science 2024-02-06 Qinbo Bai , Washim Uddin Mondal , Vaneet Aggarwal

Risk-sensitive control of continuous time Markov chains

We study risk-sensitive control of continuous time Markov chains taking values in discrete state space. We study both finite and infinite horizon problems. In the finite horizon problem we characterise the value function via HJB equation…

Optimization and Control · Mathematics 2014-09-16 Mrinal K. Ghosh , Subhamay Saha

An indirect computational procedure for receding horizon hybrid optimal control

In this work, solution of the finite horizon hybrid optimal control problem as the central element of the receding horizon optimal control (model predictive control) is investigated based on the indirect approach. The response of a hybrid…

Systems and Control · Computer Science 2020-09-24 Babak Tavassoli

Discrete-Time Approximations of Controlled Diffusions with Infinite Horizon Discounted and Average Cost

We present discrete-time approximation of optimal control policies for infinite horizon discounted/ergodic control problems for controlled diffusions in $\Rd$\,. In particular, our objective is to show near optimality of optimal policies…

Optimization and Control · Mathematics 2025-02-11 Somnath Pradhan , Serdar Yuksel

On the Global Convergence of Policy Gradient in Average Reward Markov Decision Processes

We present the first finite time global convergence analysis of policy gradient in the context of infinite horizon average reward Markov decision processes (MDPs). Specifically, we focus on ergodic tabular MDPs with finite state and action…

Machine Learning · Computer Science 2024-03-12 Navdeep Kumar , Yashaswini Murthy , Itai Shufaro , Kfir Y. Levy , R. Srikant , Shie Mannor