English
Related papers

Related papers: Deep Conservative Policy Iteration

200 papers

We consider the infinite-horizon discounted optimal control problem formalized by Markov Decision Processes. We focus on several approximate variations of the Policy Iteration algorithm: Approximate Policy Iteration, Conservative Policy…

Artificial Intelligence · Computer Science 2014-05-13 Bruno Scherrer

Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form…

Artificial Intelligence · Computer Science 2012-05-21 Bruno Scherrer , Victor Gabillon , Mohammad Ghavamzadeh , Matthieu Geist

We consider approximate dynamic programming in $\gamma$-discounted Markov decision processes and apply it to approximate planning with linear value-function approximation. Our first contribution is a new variant of Approximate Policy…

Machine Learning · Computer Science 2022-10-31 Gellért Weisz , András György , Tadashi Kozuno , Csaba Szepesvári

Tackling large approximate dynamic programming or reinforcement learning problems requires methods that can exploit regularities, or intrinsic structure, of the problem in hand. Most current methods are geared towards exploiting the…

Machine Learning · Computer Science 2014-07-03 Amir-massoud Farahmand , Doina Precup , André M. S. Barreto , Mohammad Ghavamzadeh

We introduce Reliable Policy Iteration (RPI) and Conservative RPI (CRPI), variants of Policy Iteration (PI) and Conservative PI (CPI), that retain tabular guarantees under function approximation. RPI uses a novel Bellman-constrained…

Machine Learning · Computer Science 2026-04-03 S. R. Eshwar , Gugan Thoppe , Ananyabrata Barua , Aditya Gopalan , Gal Dalal

Recently, a novel class of Approximate Policy Iteration (API) algorithms have demonstrated impressive practical performance (e.g., ExIt from [2], AlphaGo-Zero from [27]). This new family of algorithms maintains, and alternately optimizes,…

Machine Learning · Computer Science 2019-04-09 Wen Sun , Geoffrey J. Gordon , Byron Boots , J. Andrew Bagnell

This paper proposes a general incremental policy iteration adaptive dynamic programming (ADP) algorithm for model-free robust optimal control of unknown nonlinear systems. The approach integrates recursive least squares estimation with…

Optimization and Control · Mathematics 2025-09-01 Qingkai Meng , Fenglan Wang , Lin Zhao

We consider the infinite-horizon discounted optimal control problem formalized by Markov Decision Processes. We focus on Policy Search algorithms, that compute an approximately optimal policy by following the standard Policy Iteration (PI)…

Artificial Intelligence · Computer Science 2013-06-04 Bruno Scherrer

Policy iteration (PI) is a recursive process of policy evaluation and improvement for solving an optimal decision-making/control problem, or in other words, a reinforcement learning (RL) problem. PI has also served as the fundamental for…

Artificial Intelligence · Computer Science 2021-04-06 Jaeyoung Lee , Richard S. Sutton

Recent successful deep reinforcement learning algorithms, such as Trust Region Policy Optimization (TRPO) or Proximal Policy Optimization (PPO), are fundamentally variations of conservative policy iteration (CPI). These algorithms iterate…

Machine Learning · Computer Science 2020-01-27 Erinc Merdivan , Sten Hanke , Matthieu Geist

Deep reinforcement learning (DRL) policies are vulnerable to unauthorized replication attacks, where an adversary exploits imitation learning to reproduce target policies from observed behavior. In this paper, we propose Constrained…

Machine Learning · Computer Science 2021-10-01 Nancirose Piazza , Vahid Behzadan

In a recent work, we proposed Reliable Policy Iteration (RPI), that restores policy iteration's monotonicity-of-value-estimates property to the function approximation setting. Here, we assess the robustness of RPI's empirical performance on…

Artificial Intelligence · Computer Science 2025-12-16 S. R. Eshwar , Aniruddha Mukherjee , Kintan Saha , Krishna Agarwal , Gugan Thoppe , Aditya Gopalan , Gal Dalal

Model-free reinforcement learning algorithms combined with value function approximation have recently achieved impressive performance in a variety of application domains. However, the theoretical understanding of such algorithms is limited,…

Machine Learning · Computer Science 2021-02-12 Botao Hao , Nevena Lazic , Yasin Abbasi-Yadkori , Pooria Joulani , Csaba Szepesvari

In optimal control problem, policy iteration (PI) is a powerful reinforcement learning (RL) tool used for designing optimal controller for the linear systems. However, the need for an initial stabilizing control policy significantly limits…

Optimization and Control · Mathematics 2024-11-13 Zhen Pang , Shengda Tang , Jun Cheng , Shuping He

This paper introduces Deep Policy Iteration (DPI), a novel approach that integrates the strengths of Neural Networks with the stability and convergence advantages of Policy Iteration (PI) to address high-dimensional stochastic Mean Field…

Optimization and Control · Mathematics 2024-07-15 Mouhcine Assouli , Badr Missaoui

Policy Iteration (PI) is a classical family of algorithms to compute an optimal policy for any given Markov Decision Problem (MDP). The basic idea in PI is to begin with some initial policy and to repeatedly update the policy to one from an…

Given a discounted cost, we study deterministic discrete-time systems whose inputs are generated by policy iteration (PI). We provide novel near-optimality and stability properties, while allowing for non stabilizing initial policies. That…

Optimization and Control · Mathematics 2024-03-29 Jonathan de Brusse , Mathieu Granzotto , Romain Postoyan , Dragan Nešić

This paper presents a constrained adaptive dynamic programming (CADP) algorithm to solve general nonlinear nonaffine optimal control problems with known dynamics. Unlike previous ADP algorithms, it can directly deal with problems with state…

Systems and Control · Electrical Eng. & Systems 2022-04-11 Jingliang Duan , Zhengyu Liu , Shengbo Eben Li , Qi Sun , Zhenzhong Jia , Bo Cheng

Safety is the priority concern when applying reinforcement learning (RL) algorithms to real-world control problems. While policy iteration provides a fundamental algorithm for standard RL, an analogous theoretical algorithm for safe RL…

Machine Learning · Computer Science 2025-03-14 Yujie Yang , Zhilong Zheng , Shengbo Eben Li , Wei Xu , Jingjing Liu , Xianyuan Zhan , Ya-Qin Zhang

Entropy regularized algorithms such as Soft Q-learning and Soft Actor-Critic, recently showed state-of-the-art performance on a number of challenging reinforcement learning (RL) tasks. The regularized formulation modifies the standard RL…

Machine Learning · Statistics 2019-10-15 Elena Smirnova , Elvis Dohmatob
‹ Prev 1 2 3 10 Next ›