Related papers: Optimistic Value Iteration

Sound Value Iteration

Computing reachability probabilities is at the heart of probabilistic model checking. All model checkers compute these probabilities in an iterative fashion using value iteration. This technique approximates a fixed point from below by…

Logic in Computer Science · Computer Science 2018-04-16 Tim Quatmann , Joost-Pieter Katoen

Tighter Value-Function Approximations for POMDPs

Solving partially observable Markov decision processes (POMDPs) typically requires reasoning about the values of exponentially many state beliefs. Towards practical performance, state-of-the-art solvers use value bounds to guide this…

Artificial Intelligence · Computer Science 2025-02-11 Merlijn Krale , Wietze Koops , Sebastian Junges , Thiago D. Simão , Nils Jansen

Sound Value Iteration for Simple Stochastic Games

Algorithmic analysis of Markov decision processes (MDP) and stochastic games (SG) in practice relies on value-iteration (VI) algorithms. Since basic VI does not provide guarantees on the precision of the result, variants of VI have been…

Computer Science and Game Theory · Computer Science 2025-09-18 Muqsit Azeem , Jan Kretinsky , Maximilian Weininger

Sound Value Iteration for Simple Stochastic Games

Algorithmic analysis of Markov decision processes (MDP) and stochastic games (SG) in practice relies on value-iteration (VI) algorithms. Since the basic version of VI does not provide guarantees on the precision of the result, variants of…

Computer Science and Game Theory · Computer Science 2026-03-31 Muqsit Azeem , Jan Kretinsky , Maximilian Weininger

Fast Value Iteration for Goal-Directed Markov Decision Processes

Planning problems where effects of actions are non-deterministic can be modeled as Markov decision processes. Planning problems are usually goal-directed. This paper proposes several techniques for exploiting the goal-directedness to…

Artificial Intelligence · Computer Science 2013-02-08 Nevin Lianwen Zhang , Weihong Zhang

Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes

Partially observable Markov decision processes (POMDPs) have recently become popular among many AI researchers because they serve as a natural model for planning under uncertainty. Value iteration is a well-known algorithm for finding…

Artificial Intelligence · Computer Science 2011-06-02 N. L. Zhang , W. Zhang

Minimizing the Outage Probability in a Markov Decision Process

Standard Markov decision process (MDP) and reinforcement learning algorithms optimize the policy with respect to the expected gain. We propose an algorithm which enables to optimize an alternative objective: the probability that the gain is…

Machine Learning · Computer Science 2023-03-06 Vincent Corlay , Jean-Christophe Sibel

Near-Optimal Randomized Exploration for Tabular Markov Decision Processes

We study algorithms using randomized value functions for exploration in reinforcement learning. This type of algorithms enjoys appealing empirical performance. We show that when we use 1) a single random seed in each episode, and 2) a…

Machine Learning · Computer Science 2022-10-14 Zhihan Xiong , Ruoqi Shen , Qiwen Cui , Maryam Fazel , Simon S. Du

Efficient Strategy Iteration for Mean Payoff in Markov Decision Processes

Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Mean payoff (or long-run average reward) provides a mathematically elegant formalism to express performance related…

Performance · Computer Science 2017-09-08 Jan Křetínský , Tobias Meggendorfer

Value Iteration for Long-run Average Reward in Markov Decision Processes

Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Long-run average rewards provide a mathematically elegant formalism for expressing long term performance. Value iteration (VI)…

Systems and Control · Computer Science 2017-09-01 Pranav Ashok , Krishnendu Chatterjee , Przemyslaw Daca , Jan Křetínský , Tobias Meggendorfer

Lightweight Monte Carlo Verification of Markov Decision Processes with Rewards

Markov decision processes are useful models of concurrency optimisation problems, but are often intractable for exhaustive verification methods. Recent work has introduced lightweight approximative techniques that sample directly from…

Logic in Computer Science · Computer Science 2015-03-24 Axel Legay , Sean Sedwards , Louis-Marie Traonouez

Optimistic Planning by Regularized Dynamic Programming

We propose a new method for optimistic planning in infinite-horizon discounted Markov decision processes based on the idea of adding regularization to the updates of an otherwise standard approximate value iteration procedure. This…

Machine Learning · Computer Science 2023-06-16 Antoine Moulin , Gergely Neu

Generalized Second Order Value Iteration in Markov Decision Processes

Value iteration is a fixed point iteration technique utilized to obtain the optimal value function and policy in a discounted reward Markov Decision Process (MDP). Here, a contraction operator is constructed and applied repeatedly to arrive…

Machine Learning · Computer Science 2021-09-21 Chandramouli Kamanchi , Raghuram Bharadwaj Diddigi , Shalabh Bhatnagar

Optimistic and Topological Value Iteration for Simple Stochastic Games

While value iteration (VI) is a standard solution approach to simple stochastic games (SSGs), it suffered from the lack of a stopping criterion. Recently, several solutions have appeared, among them also "optimistic" VI (OVI). However, OVI…

Computer Science and Game Theory · Computer Science 2022-08-01 Muqsit Azeem , Alexandros Evangelidis , Jan Křetínský , Alexander Slivinskiy , Maximilian Weininger

Analysis of Value Iteration Through Absolute Probability Sequences

Value Iteration is a widely used algorithm for solving Markov Decision Processes (MDPs). While previous studies have extensively analyzed its convergence properties, they primarily focus on convergence with respect to the infinity norm. In…

Machine Learning · Computer Science 2025-02-06 Arsenii Mustafin , Sebastien Colla , Alex Olshevsky , Ioannis Ch. Paschalidis

Efficient Policy Optimization in Robust Constrained MDPs with Iteration Complexity Guarantees

Constrained decision-making is essential for designing safe policies in real-world control systems, yet simulated environments often fail to capture real-world adversities. We consider the problem of learning a policy that will maximize the…

Machine Learning · Computer Science 2026-02-10 Sourav Ganguly , Kishan Panaganti , Arnob Ghosh , Adam Wierman

Restricted Value Iteration: Theory and Algorithms

Value iteration is a popular algorithm for finding near optimal policies for POMDPs. It is inefficient due to the need to account for the entire belief space, which necessitates the solution of large numbers of linear programs. In this…

Artificial Intelligence · Computer Science 2011-07-04 N. L. Zhang , W. Zhang

Exponential Lower Bounds For Policy Iteration

We study policy iteration for infinite-horizon Markov decision processes. It has recently been shown policy iteration style algorithms have exponential lower bounds in a two player game setting. We extend these lower bounds to Markov…

Data Structures and Algorithms · Computer Science 2010-03-18 John Fearnley

Efficient iterative policy optimization

We tackle the issue of finding a good policy when the number of policy updates is limited. This is done by approximating the expected policy reward as a sequence of concave lower bounds which can be efficiently maximized, drastically…

Artificial Intelligence · Computer Science 2016-12-30 Nicolas Le Roux

Strong Polynomiality of the Value Iteration Algorithm for Computing Nearly Optimal Policies for Discounted Dynamic Programming

This note provides upper bounds on the number of operations required to compute by value iterations a nearly optimal policy for an infinite-horizon discounted Markov decision process with a finite number of states and actions. For a given…

Optimization and Control · Mathematics 2020-01-29 Eugene A. Feinberg , Gaojin He