Related papers: Deep Conservative Policy Iteration

Approximate Policy Iteration Schemes: A Comparison

We consider the infinite-horizon discounted optimal control problem formalized by Markov Decision Processes. We focus on several approximate variations of the Policy Iteration algorithm: Approximate Policy Iteration, Conservative Policy…

Artificial Intelligence · Computer Science 2014-05-13 Bruno Scherrer

Approximate Modified Policy Iteration

Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form…

Artificial Intelligence · Computer Science 2012-05-21 Bruno Scherrer , Victor Gabillon , Mohammad Ghavamzadeh , Matthieu Geist

Confident Approximate Policy Iteration for Efficient Local Planning in $q^\pi$-realizable MDPs

We consider approximate dynamic programming in $\gamma$-discounted Markov decision processes and apply it to approximate planning with linear value-function approximation. Our first contribution is a new variant of Approximate Policy…

Machine Learning · Computer Science 2022-10-31 Gellért Weisz , András György , Tadashi Kozuno , Csaba Szepesvári

Classification-based Approximate Policy Iteration: Experiments and Extended Discussions

Tackling large approximate dynamic programming or reinforcement learning problems requires methods that can exploit regularities, or intrinsic structure, of the problem in hand. Most current methods are geared towards exploiting the…

Machine Learning · Computer Science 2014-07-03 Amir-massoud Farahmand , Doina Precup , André M. S. Barreto , Mohammad Ghavamzadeh

Monotone and Conservative Policy Iteration Beyond the Tabular Case

We introduce Reliable Policy Iteration (RPI) and Conservative RPI (CRPI), variants of Policy Iteration (PI) and Conservative PI (CPI), that retain tabular guarantees under function approximation. RPI uses a novel Bellman-constrained…

Machine Learning · Computer Science 2026-04-03 S. R. Eshwar , Gugan Thoppe , Ananyabrata Barua , Aditya Gopalan , Gal Dalal

Dual Policy Iteration

Recently, a novel class of Approximate Policy Iteration (API) algorithms have demonstrated impressive practical performance (e.g., ExIt from [2], AlphaGo-Zero from [27]). This new family of algorithms maintains, and alternately optimizes,…

Machine Learning · Computer Science 2019-04-09 Wen Sun , Geoffrey J. Gordon , Byron Boots , J. Andrew Bagnell

Incremental Policy Iteration for Unknown Nonlinear Systems with Stability and Performance Guarantees

This paper proposes a general incremental policy iteration adaptive dynamic programming (ADP) algorithm for model-free robust optimal control of unknown nonlinear systems. The approach integrates recursive least squares estimation with…

Optimization and Control · Mathematics 2025-09-01 Qingkai Meng , Fenglan Wang , Lin Zhao

On the Performance Bounds of some Policy Search Dynamic Programming Algorithms

We consider the infinite-horizon discounted optimal control problem formalized by Markov Decision Processes. We focus on Policy Search algorithms, that compute an approximately optimal policy by following the standard Policy Iteration (PI)…

Artificial Intelligence · Computer Science 2013-06-04 Bruno Scherrer

Policy Iterations for Reinforcement Learning Problems in Continuous Time and Space -- Fundamental Theory and Methods

Policy iteration (PI) is a recursive process of policy evaluation and improvement for solving an optimal decision-making/control problem, or in other words, a reinforcement learning (RL) problem. PI has also served as the fundamental for…

Artificial Intelligence · Computer Science 2021-04-06 Jaeyoung Lee , Richard S. Sutton

Modified Actor-Critics

Recent successful deep reinforcement learning algorithms, such as Trust Region Policy Optimization (TRPO) or Proximal Policy Optimization (PPO), are fundamentally variations of conservative policy iteration (CPI). These algorithms iterate…

Machine Learning · Computer Science 2020-01-27 Erinc Merdivan , Sten Hanke , Matthieu Geist

Mitigation of Adversarial Policy Imitation via Constrained Randomization of Policy (CRoP)

Deep reinforcement learning (DRL) policies are vulnerable to unauthorized replication attacks, where an adversary exploits imitation learning to reproduce target policies from observed behavior. In this paper, we propose Constrained…

Machine Learning · Computer Science 2021-10-01 Nancirose Piazza , Vahid Behzadan

Reliable Policy Iteration: Performance Robustness Across Architecture and Environment Perturbations

In a recent work, we proposed Reliable Policy Iteration (RPI), that restores policy iteration's monotonicity-of-value-estimates property to the function approximation setting. Here, we assess the robustness of RPI's empirical performance on…

Artificial Intelligence · Computer Science 2025-12-16 S. R. Eshwar , Aniruddha Mukherjee , Kintan Saha , Krishna Agarwal , Gugan Thoppe , Aditya Gopalan , Gal Dalal

Adaptive Approximate Policy Iteration

Model-free reinforcement learning algorithms combined with value function approximation have recently achieved impressive performance in a variety of application domains. However, the theoretical understanding of such algorithms is limited,…

Machine Learning · Computer Science 2021-02-12 Botao Hao , Nevena Lazic , Yasin Abbasi-Yadkori , Pooria Joulani , Csaba Szepesvari

Scaling policy iteration based reinforcement learning for unknown discrete-time linear systems

In optimal control problem, policy iteration (PI) is a powerful reinforcement learning (RL) tool used for designing optimal controller for the linear systems. However, the need for an initial stabilizing control policy significantly limits…

Optimization and Control · Mathematics 2024-11-13 Zhen Pang , Shengda Tang , Jun Cheng , Shuping He

Deep Policy Iteration for High-Dimensional Mean Field Games

This paper introduces Deep Policy Iteration (DPI), a novel approach that integrates the strengths of Neural Networks with the stability and convergence advantages of Policy Iteration (PI) to address high-dimensional stochastic Mean Field…

Optimization and Control · Mathematics 2024-07-15 Mouhcine Assouli , Badr Missaoui

Lower Bounds for Policy Iteration on Multi-action MDPs

Policy Iteration (PI) is a classical family of algorithms to compute an optimal policy for any given Markov Decision Problem (MDP). The basic idea in PI is to begin with some initial policy and to repeatedly update the policy to one from an…

Machine Learning · Computer Science 2020-09-18 Kumar Ashutosh , Sarthak Consul , Bhishma Dedhia , Parthasarathi Khirwadkar , Sahil Shah , Shivaram Kalyanakrishnan

Policy iteration for discrete-time systems with discounted costs: stability and near-optimality guarantees

Given a discounted cost, we study deterministic discrete-time systems whose inputs are generated by policy iteration (PI). We provide novel near-optimality and stability properties, while allowing for non stabilizing initial policies. That…

Optimization and Control · Mathematics 2024-03-29 Jonathan de Brusse , Mathieu Granzotto , Romain Postoyan , Dragan Nešić

Adaptive dynamic programming for nonaffine nonlinear optimal control problem with state constraints

This paper presents a constrained adaptive dynamic programming (CADP) algorithm to solve general nonlinear nonaffine optimal control problems with known dynamics. Unlike previous ADP algorithms, it can directly deal with problems with state…

Systems and Control · Electrical Eng. & Systems 2022-04-11 Jingliang Duan , Zhengyu Liu , Shengbo Eben Li , Qi Sun , Zhenzhong Jia , Bo Cheng

Feasible Policy Iteration for Safe Reinforcement Learning

Safety is the priority concern when applying reinforcement learning (RL) algorithms to real-world control problems. While policy iteration provides a fundamental algorithm for standard RL, an analogous theoretical algorithm for safe RL…

Machine Learning · Computer Science 2025-03-14 Yujie Yang , Zhilong Zheng , Shengbo Eben Li , Wei Xu , Jingjing Liu , Xianyuan Zhan , Ya-Qin Zhang

On the Convergence of Approximate and Regularized Policy Iteration Schemes

Entropy regularized algorithms such as Soft Q-learning and Soft Actor-Critic, recently showed state-of-the-art performance on a number of challenging reinforcement learning (RL) tasks. The regularized formulation modifies the standard RL…

Machine Learning · Statistics 2019-10-15 Elena Smirnova , Elvis Dohmatob