English
Related papers

Related papers: Approximate Modified Policy Iteration

200 papers

Modified policy iteration (MPI) is a dynamic programming algorithm that combines elements of policy iteration and value iteration. The convergence of MPI has been well studied in the context of discounted and average-cost MDPs. In this…

Machine Learning · Computer Science 2024-02-16 Yashaswini Murthy , Mehrdad Moharrami , R. Srikant

We consider approximate dynamic programming in $\gamma$-discounted Markov decision processes and apply it to approximate planning with linear value-function approximation. Our first contribution is a new variant of Approximate Policy…

Machine Learning · Computer Science 2022-10-31 Gellért Weisz , András György , Tadashi Kozuno , Csaba Szepesvári

Tackling large approximate dynamic programming or reinforcement learning problems requires methods that can exploit regularities, or intrinsic structure, of the problem in hand. Most current methods are geared towards exploiting the…

Machine Learning · Computer Science 2014-07-03 Amir-massoud Farahmand , Doina Precup , André M. S. Barreto , Mohammad Ghavamzadeh

Model-free reinforcement learning algorithms combined with value function approximation have recently achieved impressive performance in a variety of application domains. However, the theoretical understanding of such algorithms is limited,…

Machine Learning · Computer Science 2021-02-12 Botao Hao , Nevena Lazic , Yasin Abbasi-Yadkori , Pooria Joulani , Csaba Szepesvari

We consider approximate dynamic programming for the infinite-horizon stationary $\gamma$-discounted optimal control problem formalized by Markov Decision Processes. While in the exact case it is known that there always exists an optimal…

Optimization and Control · Mathematics 2013-04-23 Boris Lesner , Bruno Scherrer

We consider the infinite-horizon discounted optimal control problem formalized by Markov Decision Processes. We focus on several approximate variations of the Policy Iteration algorithm: Approximate Policy Iteration, Conservative Policy…

Artificial Intelligence · Computer Science 2014-05-13 Bruno Scherrer

In this paper we discuss $\l$-policy iteration, a method for exact and approximate dynamic programming. It is intermediate between the classical value iteration (VI) and policy iteration (PI) methods, and it is closely related to optimistic…

Systems and Control · Computer Science 2015-07-07 Dimitri P. Bertsekas

Conservative Policy Iteration (CPI) is a founding algorithm of Approximate Dynamic Programming (ADP). Its core principle is to stabilize greediness through stochastic mixtures of consecutive policies. It comes with strong theoretical…

Machine Learning · Computer Science 2020-01-07 Nino Vieillard , Olivier Pietquin , Matthieu Geist

Entropy regularized algorithms such as Soft Q-learning and Soft Actor-Critic, recently showed state-of-the-art performance on a number of challenging reinforcement learning (RL) tasks. The regularized formulation modifies the standard RL…

Machine Learning · Statistics 2019-10-15 Elena Smirnova , Elvis Dohmatob

We study an approach to policy selection for large relational Markov Decision Processes (MDPs). We consider a variant of approximate policy iteration (API) that replaces the usual value-function learning step with a learning step in policy…

Artificial Intelligence · Computer Science 2011-09-13 A. Fern , R. Givan , S. Yoon

In this paper we propose a novel algorithm, factored value iteration (FVI), for the approximate solution of factored Markov decision processes (fMDPs). The traditional approximate value iteration algorithm is modified in two ways. For one,…

Artificial Intelligence · Computer Science 2008-08-13 Istvan Szita , Andras Lorincz

Policy Iteration (PI) is a classical family of algorithms to compute an optimal policy for any given Markov Decision Problem (MDP). The basic idea in PI is to begin with some initial policy and to repeatedly update the policy to one from an…

Recent successful deep reinforcement learning algorithms, such as Trust Region Policy Optimization (TRPO) or Proximal Policy Optimization (PPO), are fundamentally variations of conservative policy iteration (CPI). These algorithms iterate…

Machine Learning · Computer Science 2020-01-27 Erinc Merdivan , Sten Hanke , Matthieu Geist

Many large MDPs can be represented compactly using a dynamic Bayesian network. Although the structure of the value function does not retain the structure of the process, recent work has shown that value functions in factored MDPs can often…

Artificial Intelligence · Computer Science 2013-01-18 Daphne Koller , Ron Parr

Decision-making problems in uncertain or stochastic domains are often formulated as Markov decision processes (MDPs). Policy iteration (PI) is a popular algorithm for searching over policy-space, the size of which is exponential in the…

Artificial Intelligence · Computer Science 2013-01-30 Yishay Mansour , Satinder Singh

In this paper, we propose a novel policy iteration method, called dynamic policy programming (DPP), to estimate the optimal policy in the infinite-horizon Markov decision processes. We prove the finite-iteration and asymptotic l\infty-norm…

Machine Learning · Computer Science 2011-09-09 Mohammad Gheshlaghi Azar , Vicenc Gomez , Hilbert J. Kappen

Bisimulation metrics define a distance measure between states of a Markov decision process (MDP) based on a comparison of reward sequences. Due to this property they provide theoretical guarantees in value function approximation (VFA). In…

Machine Learning · Computer Science 2022-11-15 Mete Kemertas , Allan Jepson

In this work, we consider a cooperative multi-agent Markov decision process (MDP) involving m agents. At each decision epoch, all the m agents independently select actions in order to maximize a common long-term objective. In the policy…

Machine Learning · Computer Science 2024-05-01 Lakshmi Mandal , Chandrashekar Lakshminarayanan , Shalabh Bhatnagar

In this paper we propose an on-line policy iteration (PI) algorithm for finite-state infinite horizon discounted dynamic programming, whereby the policy improvement operation is done on-line, only for the states that are encountered during…

Optimization and Control · Mathematics 2021-06-03 Dimitri Bertsekas

Approximate dynamic programming algorithms, such as approximate value iteration, have been successfully applied to many complex reinforcement learning tasks, and a better approximate dynamic programming algorithm is expected to further…

Machine Learning · Statistics 2017-10-31 Tadashi Kozuno , Eiji Uchibe , Kenji Doya
‹ Prev 1 2 3 10 Next ›