Related papers: Performance Bounds for Lambda Policy Iteration and…

On the Global Convergence of Policy Gradient in Average Reward Markov Decision Processes

We present the first finite time global convergence analysis of policy gradient in the context of infinite horizon average reward Markov decision processes (MDPs). Specifically, we focus on ergodic tabular MDPs with finite state and action…

Machine Learning · Computer Science 2024-03-12 Navdeep Kumar , Yashaswini Murthy , Itai Shufaro , Kfir Y. Levy , R. Srikant , Shie Mannor

A policy iteration algorithm for non-Markovian control problems

In this paper, we propose a new policy iteration algorithm to compute the value function and the optimal controls of continuous time stochastic control problems. The algorithm relies on successive approximations using linear-quadratic…

Optimization and Control · Mathematics 2024-09-09 Dylan Possamaï , Ludovic Tangpi

Value and Policy Iteration in Optimal Control and Adaptive Dynamic Programming

In this paper, we consider discrete-time infinite horizon problems of optimal control to a terminal set of states. These are the problems that are often taken as the starting point for adaptive dynamic programming. Under very general…

Systems and Control · Computer Science 2015-10-05 Dimitri P. Bertsekas

Finite-Time Performance of Distributed Temporal Difference Learning with Linear Function Approximation

We study the policy evaluation problem in multi-agent reinforcement learning, modeled by a Markov decision process. In this problem, the agents operate in a common environment under a fixed control policy, working together to discover the…

Optimization and Control · Mathematics 2020-01-13 Thinh T. Doan , Siva Theja Maguluri , Justin Romberg

Efficient Approximation of Optimal Control for Markov Games

We study the time-bounded reachability problem for continuous-time Markov decision processes (CTMDPs) and games (CTMGs). Existing techniques for this problem use discretisation techniques to break time into discrete intervals, and optimal…

Computer Science and Game Theory · Computer Science 2011-07-11 John Fearnley , Markus Rabe , Sven Schewe , Lijun Zhang

Tight Performance Bounds for Approximate Modified Policy Iteration with Non-Stationary Policies

We consider approximate dynamic programming for the infinite-horizon stationary $\gamma$-discounted optimal control problem formalized by Markov Decision Processes. While in the exact case it is known that there always exists an optimal…

Optimization and Control · Mathematics 2013-04-23 Boris Lesner , Bruno Scherrer

Improved Regret Bound and Experience Replay in Regularized Policy Iteration

In this work, we study algorithms for learning in infinite-horizon undiscounted Markov decision processes (MDPs) with function approximation. We first show that the regret analysis of the Politex algorithm (a version of regularized policy…

Machine Learning · Computer Science 2021-02-26 Nevena Lazic , Dong Yin , Yasin Abbasi-Yadkori , Csaba Szepesvari

On the Performance Bounds of some Policy Search Dynamic Programming Algorithms

We consider the infinite-horizon discounted optimal control problem formalized by Markov Decision Processes. We focus on Policy Search algorithms, that compute an approximately optimal policy by following the standard Policy Iteration (PI)…

Artificial Intelligence · Computer Science 2013-06-04 Bruno Scherrer

Upper Bounds for All and Max-gain Policy Iteration Algorithms on Deterministic MDPs

Policy Iteration (PI) is a widely used family of algorithms to compute optimal policies for Markov Decision Problems (MDPs). We derive upper bounds on the running time of PI on Deterministic MDPs (DMDPs): the class of MDPs in which every…

Discrete Mathematics · Computer Science 2023-10-10 Ritesh Goenka , Eashan Gupta , Sushil Khyalia , Pratyush Agarwal , Mulinti Shaik Wajid , Shivaram Kalyanakrishnan

Lower Bound on Howard Policy Iteration for Deterministic Markov Decision Processes

Deterministic Markov Decision Processes (DMDPs) are a mathematical framework for decision-making where the outcomes and future possible actions are deterministically determined by the current action taken. DMDPs can be viewed as a finite…

Artificial Intelligence · Computer Science 2025-06-17 Ali Asadi , Krishnendu Chatterjee , Jakob de Raaij

Approximate Policy Iteration Schemes: A Comparison

We consider the infinite-horizon discounted optimal control problem formalized by Markov Decision Processes. We focus on several approximate variations of the Policy Iteration algorithm: Approximate Policy Iteration, Conservative Policy…

Artificial Intelligence · Computer Science 2014-05-13 Bruno Scherrer

Optimal Control of MDPs with Temporal Logic Constraints

In this paper, we focus on formal synthesis of control policies for finite Markov decision processes with non-negative real-valued costs. We develop an algorithm to automatically generate a policy that guarantees the satisfaction of a…

Logic in Computer Science · Computer Science 2013-09-10 Maria Svorenova , Ivana Cerna , Calin Belta

Policy Gradient Methods for Discrete Time Linear Quadratic Regulator With Random Parameters

This paper studies an infinite horizon optimal control problem for discrete-time linear system and quadratic criteria, both with random parameters which are independent and identically distributed with respect to time. In this general…

Optimization and Control · Mathematics 2024-03-04 Deyue Li

Lower Bounds for Policy Iteration on Multi-action MDPs

Policy Iteration (PI) is a classical family of algorithms to compute an optimal policy for any given Markov Decision Problem (MDP). The basic idea in PI is to begin with some initial policy and to repeatedly update the policy to one from an…

Machine Learning · Computer Science 2020-09-18 Kumar Ashutosh , Sarthak Consul , Bhishma Dedhia , Parthasarathi Khirwadkar , Sahil Shah , Shivaram Kalyanakrishnan

Polynomial Value Iteration Algorithms for Detrerminstic MDPs

Value iteration is a commonly used and empirically competitive method in solving many Markov decision process problems. However, it is known that value iteration has only pseudo-polynomial complexity in general. We establish a somewhat…

Artificial Intelligence · Computer Science 2013-01-07 Omid Madani

Policy iteration algorithm for zero-sum multichain stochastic games with mean payoff and perfect information

We consider zero-sum stochastic games with finite state and action spaces, perfect information, mean payoff criteria, without any irreducibility assumption on the Markov chains associated to strategies (multichain games). The value of such…

Optimization and Control · Mathematics 2012-08-03 Marianne Akian , Jean Cochet-Terrasson , Sylvie Detournay , Stéphane Gaubert

Exponential Lower Bounds For Policy Iteration

We study policy iteration for infinite-horizon Markov decision processes. It has recently been shown policy iteration style algorithms have exponential lower bounds in a two player game setting. We extend these lower bounds to Markov…

Data Structures and Algorithms · Computer Science 2010-03-18 John Fearnley

Convergence of Policy Iteration for Entropy-Regularized Stochastic Control Problems

For a general entropy-regularized stochastic control problem on an infinite horizon, we prove that a policy iteration algorithm (PIA) converges to an optimal relaxed control. Contrary to the standard stochastic control literature, classical…

Optimization and Control · Mathematics 2026-05-14 Yu-Jui Huang , Zhenhua Wang , Zhou Zhou

Thompson Sampling with Information Relaxation Penalties

We consider a finite-horizon multi-armed bandit (MAB) problem in a Bayesian setting, for which we propose an information relaxation sampling framework. With this framework, we define an intuitive family of control policies that include…

Machine Learning · Computer Science 2021-06-17 Seungki Min , Costis Maglaras , Ciamac C. Moallemi

Policy Iteration for Relational MDPs

Relational Markov Decision Processes are a useful abstraction for complex reinforcement learning problems and stochastic planning problems. Recent work developed representation schemes and algorithms for planning in such problems using the…

Artificial Intelligence · Computer Science 2012-06-26 Chenggang Wang , Roni Khardon