English
Related papers

Related papers: Performance Bounds for Lambda Policy Iteration and…

200 papers

We present the first finite time global convergence analysis of policy gradient in the context of infinite horizon average reward Markov decision processes (MDPs). Specifically, we focus on ergodic tabular MDPs with finite state and action…

Machine Learning · Computer Science 2024-03-12 Navdeep Kumar , Yashaswini Murthy , Itai Shufaro , Kfir Y. Levy , R. Srikant , Shie Mannor

In this paper, we propose a new policy iteration algorithm to compute the value function and the optimal controls of continuous time stochastic control problems. The algorithm relies on successive approximations using linear-quadratic…

Optimization and Control · Mathematics 2024-09-09 Dylan Possamaï , Ludovic Tangpi

In this paper, we consider discrete-time infinite horizon problems of optimal control to a terminal set of states. These are the problems that are often taken as the starting point for adaptive dynamic programming. Under very general…

Systems and Control · Computer Science 2015-10-05 Dimitri P. Bertsekas

We study the policy evaluation problem in multi-agent reinforcement learning, modeled by a Markov decision process. In this problem, the agents operate in a common environment under a fixed control policy, working together to discover the…

Optimization and Control · Mathematics 2020-01-13 Thinh T. Doan , Siva Theja Maguluri , Justin Romberg

We study the time-bounded reachability problem for continuous-time Markov decision processes (CTMDPs) and games (CTMGs). Existing techniques for this problem use discretisation techniques to break time into discrete intervals, and optimal…

Computer Science and Game Theory · Computer Science 2011-07-11 John Fearnley , Markus Rabe , Sven Schewe , Lijun Zhang

We consider approximate dynamic programming for the infinite-horizon stationary $\gamma$-discounted optimal control problem formalized by Markov Decision Processes. While in the exact case it is known that there always exists an optimal…

Optimization and Control · Mathematics 2013-04-23 Boris Lesner , Bruno Scherrer

In this work, we study algorithms for learning in infinite-horizon undiscounted Markov decision processes (MDPs) with function approximation. We first show that the regret analysis of the Politex algorithm (a version of regularized policy…

Machine Learning · Computer Science 2021-02-26 Nevena Lazic , Dong Yin , Yasin Abbasi-Yadkori , Csaba Szepesvari

We consider the infinite-horizon discounted optimal control problem formalized by Markov Decision Processes. We focus on Policy Search algorithms, that compute an approximately optimal policy by following the standard Policy Iteration (PI)…

Artificial Intelligence · Computer Science 2013-06-04 Bruno Scherrer

Policy Iteration (PI) is a widely used family of algorithms to compute optimal policies for Markov Decision Problems (MDPs). We derive upper bounds on the running time of PI on Deterministic MDPs (DMDPs): the class of MDPs in which every…

Discrete Mathematics · Computer Science 2023-10-10 Ritesh Goenka , Eashan Gupta , Sushil Khyalia , Pratyush Agarwal , Mulinti Shaik Wajid , Shivaram Kalyanakrishnan

Deterministic Markov Decision Processes (DMDPs) are a mathematical framework for decision-making where the outcomes and future possible actions are deterministically determined by the current action taken. DMDPs can be viewed as a finite…

Artificial Intelligence · Computer Science 2025-06-17 Ali Asadi , Krishnendu Chatterjee , Jakob de Raaij

We consider the infinite-horizon discounted optimal control problem formalized by Markov Decision Processes. We focus on several approximate variations of the Policy Iteration algorithm: Approximate Policy Iteration, Conservative Policy…

Artificial Intelligence · Computer Science 2014-05-13 Bruno Scherrer

In this paper, we focus on formal synthesis of control policies for finite Markov decision processes with non-negative real-valued costs. We develop an algorithm to automatically generate a policy that guarantees the satisfaction of a…

Logic in Computer Science · Computer Science 2013-09-10 Maria Svorenova , Ivana Cerna , Calin Belta

This paper studies an infinite horizon optimal control problem for discrete-time linear system and quadratic criteria, both with random parameters which are independent and identically distributed with respect to time. In this general…

Optimization and Control · Mathematics 2024-03-04 Deyue Li

Policy Iteration (PI) is a classical family of algorithms to compute an optimal policy for any given Markov Decision Problem (MDP). The basic idea in PI is to begin with some initial policy and to repeatedly update the policy to one from an…

Value iteration is a commonly used and empirically competitive method in solving many Markov decision process problems. However, it is known that value iteration has only pseudo-polynomial complexity in general. We establish a somewhat…

Artificial Intelligence · Computer Science 2013-01-07 Omid Madani

We consider zero-sum stochastic games with finite state and action spaces, perfect information, mean payoff criteria, without any irreducibility assumption on the Markov chains associated to strategies (multichain games). The value of such…

Optimization and Control · Mathematics 2012-08-03 Marianne Akian , Jean Cochet-Terrasson , Sylvie Detournay , Stéphane Gaubert

We study policy iteration for infinite-horizon Markov decision processes. It has recently been shown policy iteration style algorithms have exponential lower bounds in a two player game setting. We extend these lower bounds to Markov…

Data Structures and Algorithms · Computer Science 2010-03-18 John Fearnley

For a general entropy-regularized stochastic control problem on an infinite horizon, we prove that a policy iteration algorithm (PIA) converges to an optimal relaxed control. Contrary to the standard stochastic control literature, classical…

Optimization and Control · Mathematics 2026-05-14 Yu-Jui Huang , Zhenhua Wang , Zhou Zhou

We consider a finite-horizon multi-armed bandit (MAB) problem in a Bayesian setting, for which we propose an information relaxation sampling framework. With this framework, we define an intuitive family of control policies that include…

Machine Learning · Computer Science 2021-06-17 Seungki Min , Costis Maglaras , Ciamac C. Moallemi

Relational Markov Decision Processes are a useful abstraction for complex reinforcement learning problems and stochastic planning problems. Recent work developed representation schemes and algorithms for planning in such problems using the…

Artificial Intelligence · Computer Science 2012-06-26 Chenggang Wang , Roni Khardon
‹ Prev 1 2 3 10 Next ›