Related papers: Value Iteration is Optic Composition

When to stop value iteration: stability and near-optimality versus computation

Value iteration (VI) is a ubiquitous algorithm for optimal control, planning, and reinforcement learning schemes. Under the right assumptions, VI is a vital tool to generate inputs with desirable properties for the controlled system, like…

Optimization and Control · Mathematics 2020-11-23 Mathieu Granzotto , Romain Postoyan , Dragan Nešić , Lucian Buşoniu , Jamal Daafouz

On the Complexity of Value Iteration

Value iteration is a fundamental algorithm for solving Markov Decision Processes (MDPs). It computes the maximal $n$-step payoff by iterating $n$ times a recurrence equation which is naturally associated to the MDP. At the same time, value…

Formal Languages and Automata Theory · Computer Science 2019-04-30 Nikhil Balaji , Stefan Kiefer , Petr Novotný , Guillermo A. Pérez , Mahsa Shirmohammadi

Parameterized Reinforcement Learning for Optical System Optimization

Designing a multi-layer optical system with designated optical characteristics is an inverse design problem in which the resulting design is determined by several discrete and continuous parameters. In particular, we consider three design…

Machine Learning · Computer Science 2021-11-17 Heribert Wankerl , Maike L. Stern , Ali Mahdavi , Christoph Eichler , Elmar W. Lang

Continuous Inverse Optimal Control with Locally Optimal Examples

Inverse optimal control, also known as inverse reinforcement learning, is the problem of recovering an unknown reward function in a Markov decision process from expert demonstrations of the optimal policy. We introduce a probabilistic…

Machine Learning · Computer Science 2012-06-22 Sergey Levine , Vladlen Koltun

Robust Value Iteration for Continuous Control Tasks

When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well. Commonly, the optimal policy overfits to the approximate model and the corresponding…

Machine Learning · Computer Science 2021-05-27 Michael Lutter , Shie Mannor , Jan Peters , Dieter Fox , Animesh Garg

Unifying Value Iteration, Advantage Learning, and Dynamic Policy Programming

Approximate dynamic programming algorithms, such as approximate value iteration, have been successfully applied to many complex reinforcement learning tasks, and a better approximate dynamic programming algorithm is expected to further…

Machine Learning · Statistics 2017-10-31 Tadashi Kozuno , Eiji Uchibe , Kenji Doya

The Value Iteration Algorithm is Not Strongly Polynomial for Discounted Dynamic Programming

This note provides a simple example demonstrating that, if exact computations are allowed, the number of iterations required for the value iteration algorithm to find an optimal policy for discounted dynamic programming problems may grow…

Artificial Intelligence · Computer Science 2013-12-25 Eugene A. Feinberg , Jefferson Huang

Minimizing the Outage Probability in a Markov Decision Process

Standard Markov decision process (MDP) and reinforcement learning algorithms optimize the policy with respect to the expected gain. We propose an algorithm which enables to optimize an alternative objective: the probability that the gain is…

Machine Learning · Computer Science 2023-03-06 Vincent Corlay , Jean-Christophe Sibel

Deflated Dynamics Value Iteration

The Value Iteration (VI) algorithm is an iterative procedure to compute the value function of a Markov decision process, and is the basis of many reinforcement learning (RL) algorithms as well. As the error convergence rate of VI as a…

Machine Learning · Computer Science 2025-06-12 Jongmin Lee , Amin Rakhsha , Ernest K. Ryu , Amir-massoud Farahmand

Policy Iteration for Relational MDPs

Relational Markov Decision Processes are a useful abstraction for complex reinforcement learning problems and stochastic planning problems. Recent work developed representation schemes and algorithms for planning in such problems using the…

Artificial Intelligence · Computer Science 2012-06-26 Chenggang Wang , Roni Khardon

Feature Markov Decision Processes

General purpose intelligent learning agents cycle through (complex,non-MDP) sequences of observations, actions, and rewards. On the other hand, reinforcement learning is well-developed for small finite state Markov Decision Processes…

Artificial Intelligence · Computer Science 2009-12-30 Marcus Hutter

Strong Polynomiality of the Value Iteration Algorithm for Computing Nearly Optimal Policies for Discounted Dynamic Programming

This note provides upper bounds on the number of operations required to compute by value iterations a nearly optimal policy for an infinite-horizon discounted Markov decision process with a finite number of states and actions. For a given…

Optimization and Control · Mathematics 2020-01-29 Eugene A. Feinberg , Gaojin He

Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes

Partially observable Markov decision processes (POMDPs) have recently become popular among many AI researchers because they serve as a natural model for planning under uncertainty. Value iteration is a well-known algorithm for finding…

Artificial Intelligence · Computer Science 2011-06-02 N. L. Zhang , W. Zhang

Action-State Dependent Dynamic Model Selection

A model among many may only be best under certain states of the world. Switching from a model to another can also be costly. Finding a procedure to dynamically choose a model in these circumstances requires to solve a complex estimation…

Machine Learning · Computer Science 2023-10-10 Francesco Cordoni , Alessio Sancetta

Dynamic Programming Through the Lens of Semismooth Newton-Type Methods (Extended Version)

Policy iteration and value iteration are at the core of many (approximate) dynamic programming methods. For Markov Decision Processes with finite state and action spaces, we show that they are instances of semismooth Newton-type methods to…

Optimization and Control · Mathematics 2022-06-28 Matilde Gargiani , Andrea Zanelli , Dominic Liao-McPherson , Tyler Summers , John Lygeros

Multiagent Value Iteration Algorithms in Dynamic Programming and Reinforcement Learning

We consider infinite horizon dynamic programming problems, where the control at each stage consists of several distinct decisions, each one made by one of several agents. In an earlier work we introduced a policy iteration algorithm, where…

Optimization and Control · Mathematics 2020-05-05 Dimitri Bertsekas

The Analysis of Optimization Algorithms, A Dissipativity Approach

Optimization problems in engineering and applied mathematics are typically solved in an iterative fashion, by systematically adjusting the variables of interest until an adequate solution is found. The iterative algorithms that govern these…

Optimization and Control · Mathematics 2022-05-31 Laurent Lessard

Value and Policy Iteration in Optimal Control and Adaptive Dynamic Programming

In this paper, we consider discrete-time infinite horizon problems of optimal control to a terminal set of states. These are the problems that are often taken as the starting point for adaptive dynamic programming. Under very general…

Systems and Control · Computer Science 2015-10-05 Dimitri P. Bertsekas

Fast Value Iteration for Goal-Directed Markov Decision Processes

Planning problems where effects of actions are non-deterministic can be modeled as Markov decision processes. Planning problems are usually goal-directed. This paper proposes several techniques for exploiting the goal-directedness to…

Artificial Intelligence · Computer Science 2013-02-08 Nevin Lianwen Zhang , Weihong Zhang

Dynamic Programming: From Local Optimality to Global Optimality

In the theory of dynamic programming, an optimal policy is a policy whose lifetime value dominates that of all other policies from every possible initial condition in the state space. This raises a natural question: when does optimality…

Optimization and Control · Mathematics 2025-05-13 John Stachurski , Jingni Yang , Ziyue Yang