English
Related papers

Related papers: Acceleration Operators in the Value Iteration Algo…

200 papers

We study the general approach to accelerating the convergence of the most widely used solution method of Markov decision processes with the total expected discounted reward. Inspired by the monotone behavior of the contraction mappings in…

Optimization and Control · Mathematics 2008-03-28 Oleksandr Shlakhter , Chi-Guhn Lee , Dmitry Khmelev , Nasser Jaber

Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Mean payoff (or long-run average reward) provides a mathematically elegant formalism to express performance related…

Performance · Computer Science 2017-09-08 Jan Křetínský , Tobias Meggendorfer

Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Long-run average rewards provide a mathematically elegant formalism for expressing long term performance. Value iteration (VI)…

Systems and Control · Computer Science 2017-09-01 Pranav Ashok , Krishnendu Chatterjee , Przemyslaw Daca , Jan Křetínský , Tobias Meggendorfer

Value iteration is a fixed point iteration technique utilized to obtain the optimal value function and policy in a discounted reward Markov Decision Process (MDP). Here, a contraction operator is constructed and applied repeatedly to arrive…

Machine Learning · Computer Science 2021-09-21 Chandramouli Kamanchi , Raghuram Bharadwaj Diddigi , Shalabh Bhatnagar

Partially observable Markov decision processes (POMDPs) have recently become popular among many AI researchers because they serve as a natural model for planning under uncertainty. Value iteration is a well-known algorithm for finding…

Artificial Intelligence · Computer Science 2011-06-02 N. L. Zhang , W. Zhang

Value iteration is a well-known method of solving Markov Decision Processes (MDPs) that is simple to implement and boasts strong theoretical convergence guarantees. However, the computational cost of value iteration quickly becomes…

Machine Learning · Computer Science 2021-07-26 Guanting Chen , Johann Demetrio Gaebler , Matt Peng , Chunlin Sun , Yinyu Ye

Markov decision processes (MDPs) are used to model stochastic systems in many applications. Several efficient algorithms to compute optimal policies have been studied in the literature, including value iteration (VI) and policy iteration.…

Optimization and Control · Mathematics 2021-08-30 Vineet Goyal , Julien Grand-Clement

Value iteration is a commonly used and empirically competitive method in solving many Markov decision process problems. However, it is known that value iteration has only pseudo-polynomial complexity in general. We establish a somewhat…

Artificial Intelligence · Computer Science 2013-01-07 Omid Madani

We study value-iteration (VI) algorithms for solving general (a.k.a. multichain) Markov decision processes (MDPs) under the average-reward criterion, a fundamental but theoretically challenging setting. Beyond the difficulties inherent to…

Optimization and Control · Mathematics 2026-04-23 Matthew Zurek , Yudong Chen

Planning problems where effects of actions are non-deterministic can be modeled as Markov decision processes. Planning problems are usually goal-directed. This paper proposes several techniques for exploiting the goal-directedness to…

Artificial Intelligence · Computer Science 2013-02-08 Nevin Lianwen Zhang , Weihong Zhang

Many sequential decision problems can be formulated as Markov Decision Processes (MDPs) where the optimal value function (or cost-to-go function) can be shown to satisfy a monotone structure in some or all of its dimensions. When the state…

Optimization and Control · Mathematics 2015-09-03 Daniel R. Jiang , Warren B. Powell

Interval Markov decision processes are a class of Markov models where the transition probabilities between the states belong to intervals. In this paper, we study the problem of efficient estimation of the optimal policies in Interval…

Systems and Control · Electrical Eng. & Systems 2023-09-19 Saber Jafarpour , Samuel Coogan

This paper discusses algorithms for solving Markov decision processes (MDPs) that have monotone optimal policies. We propose a two-stage alternating convex optimization scheme that can accelerate the search for an optimal policy by…

Systems and Control · Computer Science 2017-04-04 Robert Mattila , Cristian R. Rojas , Vikram Krishnamurthy , Bo Wahlberg

While there is an extensive body of research on the analysis of Value Iteration (VI) for discounted cumulative-reward MDPs, prior work on analyzing VI for (undiscounted) average-reward MDPs has been limited, and most prior results focus on…

Optimization and Control · Mathematics 2026-02-10 Jongmin Lee , Ernest K. Ryu

We build on a recently introduced geometric interpretation of Markov Decision Processes (MDPs) to analyze classical MDP-solving algorithms: Value Iteration (VI) and Policy Iteration (PI). First, we develop a geometry-based analytical…

Machine Learning · Computer Science 2025-03-07 Arsenii Mustafin , Aleksei Pakharev , Alex Olshevsky , Ioannis Ch. Paschalidis

We present a technique for speeding up the convergence of value iteration for partially observable Markov decisions processes (POMDPs). The underlying idea is similar to that behind modified policy iteration for fully observable Markov…

Artificial Intelligence · Computer Science 2013-01-30 Nevin Lianwen Zhang , Stephen S. Lee , Weihong Zhang

This paper studies the expected value of multiplicative rewards, where rewards obtained in each step are multiplied (instead of the usual addition), in Markov chains (MCs) and Markov decision processes (MDPs). One of the key differences to…

Logic in Computer Science · Computer Science 2025-06-24 Christel Baier , Krishnendu Chatterjee , Tobias Meggendorfer , Jakob Piribauer

Value Iteration is a widely used algorithm for solving Markov Decision Processes (MDPs). While previous studies have extensively analyzed its convergence properties, they primarily focus on convergence with respect to the infinity norm. In…

Machine Learning · Computer Science 2025-02-06 Arsenii Mustafin , Sebastien Colla , Alex Olshevsky , Ioannis Ch. Paschalidis

Constrained decision-making is essential for designing safe policies in real-world control systems, yet simulated environments often fail to capture real-world adversities. We consider the problem of learning a policy that will maximize the…

Machine Learning · Computer Science 2026-02-10 Sourav Ganguly , Kishan Panaganti , Arnob Ghosh , Adam Wierman

This paper studies convergence properties of optimal values and actions for discounted and average-cost Markov Decision Processes (MDPs) with weakly continuous transition probabilities and applies these properties to the stochastic…

Optimization and Control · Mathematics 2017-03-21 Eugene A. Feinberg , Mark E. Lewis
‹ Prev 1 2 3 10 Next ›