English
Related papers

Related papers: Value Iteration is Optic Composition

200 papers

Value iteration (VI) is a ubiquitous algorithm for optimal control, planning, and reinforcement learning schemes. Under the right assumptions, VI is a vital tool to generate inputs with desirable properties for the controlled system, like…

Optimization and Control · Mathematics 2020-11-23 Mathieu Granzotto , Romain Postoyan , Dragan Nešić , Lucian Buşoniu , Jamal Daafouz

Value iteration is a fundamental algorithm for solving Markov Decision Processes (MDPs). It computes the maximal $n$-step payoff by iterating $n$ times a recurrence equation which is naturally associated to the MDP. At the same time, value…

Formal Languages and Automata Theory · Computer Science 2019-04-30 Nikhil Balaji , Stefan Kiefer , Petr Novotný , Guillermo A. Pérez , Mahsa Shirmohammadi

Designing a multi-layer optical system with designated optical characteristics is an inverse design problem in which the resulting design is determined by several discrete and continuous parameters. In particular, we consider three design…

Machine Learning · Computer Science 2021-11-17 Heribert Wankerl , Maike L. Stern , Ali Mahdavi , Christoph Eichler , Elmar W. Lang

Inverse optimal control, also known as inverse reinforcement learning, is the problem of recovering an unknown reward function in a Markov decision process from expert demonstrations of the optimal policy. We introduce a probabilistic…

Machine Learning · Computer Science 2012-06-22 Sergey Levine , Vladlen Koltun

When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well. Commonly, the optimal policy overfits to the approximate model and the corresponding…

Machine Learning · Computer Science 2021-05-27 Michael Lutter , Shie Mannor , Jan Peters , Dieter Fox , Animesh Garg

Approximate dynamic programming algorithms, such as approximate value iteration, have been successfully applied to many complex reinforcement learning tasks, and a better approximate dynamic programming algorithm is expected to further…

Machine Learning · Statistics 2017-10-31 Tadashi Kozuno , Eiji Uchibe , Kenji Doya

This note provides a simple example demonstrating that, if exact computations are allowed, the number of iterations required for the value iteration algorithm to find an optimal policy for discounted dynamic programming problems may grow…

Artificial Intelligence · Computer Science 2013-12-25 Eugene A. Feinberg , Jefferson Huang

Standard Markov decision process (MDP) and reinforcement learning algorithms optimize the policy with respect to the expected gain. We propose an algorithm which enables to optimize an alternative objective: the probability that the gain is…

Machine Learning · Computer Science 2023-03-06 Vincent Corlay , Jean-Christophe Sibel

The Value Iteration (VI) algorithm is an iterative procedure to compute the value function of a Markov decision process, and is the basis of many reinforcement learning (RL) algorithms as well. As the error convergence rate of VI as a…

Machine Learning · Computer Science 2025-06-12 Jongmin Lee , Amin Rakhsha , Ernest K. Ryu , Amir-massoud Farahmand

Relational Markov Decision Processes are a useful abstraction for complex reinforcement learning problems and stochastic planning problems. Recent work developed representation schemes and algorithms for planning in such problems using the…

Artificial Intelligence · Computer Science 2012-06-26 Chenggang Wang , Roni Khardon

General purpose intelligent learning agents cycle through (complex,non-MDP) sequences of observations, actions, and rewards. On the other hand, reinforcement learning is well-developed for small finite state Markov Decision Processes…

Artificial Intelligence · Computer Science 2009-12-30 Marcus Hutter

This note provides upper bounds on the number of operations required to compute by value iterations a nearly optimal policy for an infinite-horizon discounted Markov decision process with a finite number of states and actions. For a given…

Optimization and Control · Mathematics 2020-01-29 Eugene A. Feinberg , Gaojin He

Partially observable Markov decision processes (POMDPs) have recently become popular among many AI researchers because they serve as a natural model for planning under uncertainty. Value iteration is a well-known algorithm for finding…

Artificial Intelligence · Computer Science 2011-06-02 N. L. Zhang , W. Zhang

A model among many may only be best under certain states of the world. Switching from a model to another can also be costly. Finding a procedure to dynamically choose a model in these circumstances requires to solve a complex estimation…

Machine Learning · Computer Science 2023-10-10 Francesco Cordoni , Alessio Sancetta

Policy iteration and value iteration are at the core of many (approximate) dynamic programming methods. For Markov Decision Processes with finite state and action spaces, we show that they are instances of semismooth Newton-type methods to…

Optimization and Control · Mathematics 2022-06-28 Matilde Gargiani , Andrea Zanelli , Dominic Liao-McPherson , Tyler Summers , John Lygeros

We consider infinite horizon dynamic programming problems, where the control at each stage consists of several distinct decisions, each one made by one of several agents. In an earlier work we introduced a policy iteration algorithm, where…

Optimization and Control · Mathematics 2020-05-05 Dimitri Bertsekas

Optimization problems in engineering and applied mathematics are typically solved in an iterative fashion, by systematically adjusting the variables of interest until an adequate solution is found. The iterative algorithms that govern these…

Optimization and Control · Mathematics 2022-05-31 Laurent Lessard

In this paper, we consider discrete-time infinite horizon problems of optimal control to a terminal set of states. These are the problems that are often taken as the starting point for adaptive dynamic programming. Under very general…

Systems and Control · Computer Science 2015-10-05 Dimitri P. Bertsekas

Planning problems where effects of actions are non-deterministic can be modeled as Markov decision processes. Planning problems are usually goal-directed. This paper proposes several techniques for exploiting the goal-directedness to…

Artificial Intelligence · Computer Science 2013-02-08 Nevin Lianwen Zhang , Weihong Zhang

In the theory of dynamic programming, an optimal policy is a policy whose lifetime value dominates that of all other policies from every possible initial condition in the state space. This raises a natural question: when does optimality…

Optimization and Control · Mathematics 2025-05-13 John Stachurski , Jingni Yang , Ziyue Yang
‹ Prev 1 2 3 10 Next ›