Related papers: Dynamic Policy Programming

Approximate Policy Iteration Schemes: A Comparison

We consider the infinite-horizon discounted optimal control problem formalized by Markov Decision Processes. We focus on several approximate variations of the Policy Iteration algorithm: Approximate Policy Iteration, Conservative Policy…

Artificial Intelligence · Computer Science 2014-05-13 Bruno Scherrer

On the Performance Bounds of some Policy Search Dynamic Programming Algorithms

We consider the infinite-horizon discounted optimal control problem formalized by Markov Decision Processes. We focus on Policy Search algorithms, that compute an approximately optimal policy by following the standard Policy Iteration (PI)…

Artificial Intelligence · Computer Science 2013-06-04 Bruno Scherrer

Tight Performance Bounds for Approximate Modified Policy Iteration with Non-Stationary Policies

We consider approximate dynamic programming for the infinite-horizon stationary $\gamma$-discounted optimal control problem formalized by Markov Decision Processes. While in the exact case it is known that there always exists an optimal…

Optimization and Control · Mathematics 2013-04-23 Boris Lesner , Bruno Scherrer

Approximate Dynamic Programming By Minimizing Distributionally Robust Bounds

Approximate dynamic programming is a popular method for solving large Markov decision processes. This paper describes a new class of approximate dynamic programming (ADP) methods- distributionally robust ADP-that address the curse of…

Machine Learning · Statistics 2012-05-22 Marek Petrik

Confident Approximate Policy Iteration for Efficient Local Planning in $q^\pi$-realizable MDPs

We consider approximate dynamic programming in $\gamma$-discounted Markov decision processes and apply it to approximate planning with linear value-function approximation. Our first contribution is a new variant of Approximate Policy…

Machine Learning · Computer Science 2022-10-31 Gellért Weisz , András György , Tadashi Kozuno , Csaba Szepesvári

Approximate Modified Policy Iteration

Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form…

Artificial Intelligence · Computer Science 2012-05-21 Bruno Scherrer , Victor Gabillon , Mohammad Ghavamzadeh , Matthieu Geist

Incremental Policy Iteration for Unknown Nonlinear Systems with Stability and Performance Guarantees

This paper proposes a general incremental policy iteration adaptive dynamic programming (ADP) algorithm for model-free robust optimal control of unknown nonlinear systems. The approach integrates recursive least squares estimation with…

Optimization and Control · Mathematics 2025-09-01 Qingkai Meng , Fenglan Wang , Lin Zhao

Approximate Linear Programming for Decentralized Policy Iteration in Cooperative Multi-agent Markov Decision Processes

In this work, we consider a cooperative multi-agent Markov decision process (MDP) involving m agents. At each decision epoch, all the m agents independently select actions in order to maximize a common long-term objective. In the policy…

Machine Learning · Computer Science 2024-05-01 Lakshmi Mandal , Chandrashekar Lakshminarayanan , Shalabh Bhatnagar

A Structure-aware Online Learning Algorithm for Markov Decision Processes

To overcome the curse of dimensionality and curse of modeling in Dynamic Programming (DP) methods for solving classical Markov Decision Process (MDP) problems, Reinforcement Learning (RL) algorithms are popular. In this paper, we consider…

Machine Learning · Computer Science 2018-11-29 Arghyadip Roy , Vivek Borkar , Abhay Karandikar , Prasanna Chaporkar

Online Markov decision processes with policy iteration

The online Markov decision process (MDP) is a generalization of the classical Markov decision process that incorporates changing reward functions. In this paper, we propose practical online MDP algorithms with policy iteration and…

Machine Learning · Computer Science 2015-10-16 Yao Ma , Hao Zhang , Masashi Sugiyama

Multiagent Value Iteration Algorithms in Dynamic Programming and Reinforcement Learning

We consider infinite horizon dynamic programming problems, where the control at each stage consists of several distinct decisions, each one made by one of several agents. In an earlier work we introduced a policy iteration algorithm, where…

Optimization and Control · Mathematics 2020-05-05 Dimitri Bertsekas

Adaptive Approximate Policy Iteration

Model-free reinforcement learning algorithms combined with value function approximation have recently achieved impressive performance in a variety of application domains. However, the theoretical understanding of such algorithms is limited,…

Machine Learning · Computer Science 2021-02-12 Botao Hao , Nevena Lazic , Yasin Abbasi-Yadkori , Pooria Joulani , Csaba Szepesvari

Beyond Stationarity: Convergence Analysis of Stochastic Softmax Policy Gradient Methods

Markov Decision Processes (MDPs) are a formal framework for modeling and solving sequential decision-making problems. In finite-time horizons such problems are relevant for instance for optimal stopping or specific supply chain problems,…

Optimization and Control · Mathematics 2024-05-07 Sara Klein , Simon Weissmann , Leif Döring

Approximate Constrained Discounted Dynamic Programming with Uniform Feasibility and Optimality

We consider a dynamic programming (DP) approach to approximately solving an infinite-horizon constrained Markov decision process (CMDP) problem with a fixed initial-state for the expected total discounted-reward criterion with a…

Optimization and Control · Mathematics 2023-08-08 Hyeong Soo Chang

Variance Reduced Value Iteration and Faster Algorithms for Solving Markov Decision Processes

In this paper we provide faster algorithms for approximately solving discounted Markov Decision Processes in multiple parameter regimes. Given a discounted Markov Decision Process (DMDP) with $|S|$ states, $|A|$ actions, discount factor…

Data Structures and Algorithms · Computer Science 2020-12-24 Aaron Sidford , Mengdi Wang , Xian Wu , Yinyu Ye

Optimistic Planning by Regularized Dynamic Programming

We propose a new method for optimistic planning in infinite-horizon discounted Markov decision processes based on the idea of adding regularization to the updates of an otherwise standard approximate value iteration procedure. This…

Machine Learning · Computer Science 2023-06-16 Antoine Moulin , Gergely Neu

Dual Policy Iteration

Recently, a novel class of Approximate Policy Iteration (API) algorithms have demonstrated impressive practical performance (e.g., ExIt from [2], AlphaGo-Zero from [27]). This new family of algorithms maintains, and alternately optimizes,…

Machine Learning · Computer Science 2019-04-09 Wen Sun , Geoffrey J. Gordon , Byron Boots , J. Andrew Bagnell

Finite-Horizon Markov Decision Processes with State Constraints

Markov Decision Processes (MDPs) have been used to formulate many decision-making problems in science and engineering. The objective is to synthesize the best decision (action selection) policies to maximize expected rewards (minimize…

Optimization and Control · Mathematics 2015-07-08 Mahmoud El Chamie , Behcet Acikmese

Lower Bound on Howard Policy Iteration for Deterministic Markov Decision Processes

Deterministic Markov Decision Processes (DMDPs) are a mathematical framework for decision-making where the outcomes and future possible actions are deterministically determined by the current action taken. DMDPs can be viewed as a finite…

Artificial Intelligence · Computer Science 2025-06-17 Ali Asadi , Krishnendu Chatterjee , Jakob de Raaij

Global Algorithms for Mean-Variance Optimization in Markov Decision Processes

Dynamic optimization of mean and variance in Markov decision processes (MDPs) is a long-standing challenge caused by the failure of dynamic programming. In this paper, we propose a new approach to find the globally optimal policy for…

Optimization and Control · Mathematics 2023-02-28 Li Xia , Shuai Ma