Related papers: An Adaptive State Aggregation Algorithm for Markov…

Value Iteration with Options and State Aggregation

This paper presents a way of solving Markov Decision Processes that combines state abstraction and temporal abstraction. Specifically, we combine state aggregation with the options framework and demonstrate that they work well together and…

Artificial Intelligence · Computer Science 2015-01-19 Kamil Ciosek , David Silver

Variance Reduced Value Iteration and Faster Algorithms for Solving Markov Decision Processes

In this paper we provide faster algorithms for approximately solving discounted Markov Decision Processes in multiple parameter regimes. Given a discounted Markov Decision Process (DMDP) with $|S|$ states, $|A|$ actions, discount factor…

Data Structures and Algorithms · Computer Science 2020-12-24 Aaron Sidford , Mengdi Wang , Xian Wu , Yinyu Ye

Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes

Partially observable Markov decision processes (POMDPs) have recently become popular among many AI researchers because they serve as a natural model for planning under uncertainty. Value iteration is a well-known algorithm for finding…

Artificial Intelligence · Computer Science 2011-06-02 N. L. Zhang , W. Zhang

Efficient Strategy Iteration for Mean Payoff in Markov Decision Processes

Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Mean payoff (or long-run average reward) provides a mathematically elegant formalism to express performance related…

Performance · Computer Science 2017-09-08 Jan Křetínský , Tobias Meggendorfer

Polynomial Value Iteration Algorithms for Detrerminstic MDPs

Value iteration is a commonly used and empirically competitive method in solving many Markov decision process problems. However, it is known that value iteration has only pseudo-polynomial complexity in general. We establish a somewhat…

Artificial Intelligence · Computer Science 2013-01-07 Omid Madani

On the Complexity of Value Iteration

Value iteration is a fundamental algorithm for solving Markov Decision Processes (MDPs). It computes the maximal $n$-step payoff by iterating $n$ times a recurrence equation which is naturally associated to the MDP. At the same time, value…

Formal Languages and Automata Theory · Computer Science 2019-04-30 Nikhil Balaji , Stefan Kiefer , Petr Novotný , Guillermo A. Pérez , Mahsa Shirmohammadi

Analysis of Value Iteration Through Absolute Probability Sequences

Value Iteration is a widely used algorithm for solving Markov Decision Processes (MDPs). While previous studies have extensively analyzed its convergence properties, they primarily focus on convergence with respect to the infinity norm. In…

Machine Learning · Computer Science 2025-02-06 Arsenii Mustafin , Sebastien Colla , Alex Olshevsky , Ioannis Ch. Paschalidis

Geometric Re-Analysis of Classical MDP Solving Algorithms

We build on a recently introduced geometric interpretation of Markov Decision Processes (MDPs) to analyze classical MDP-solving algorithms: Value Iteration (VI) and Policy Iteration (PI). First, we develop a geometry-based analytical…

Machine Learning · Computer Science 2025-03-07 Arsenii Mustafin , Aleksei Pakharev , Alex Olshevsky , Ioannis Ch. Paschalidis

Max-Plus Matching Pursuit for Deterministic Markov Decision Processes

We consider deterministic Markov decision processes (MDPs) and apply max-plus algebra tools to approximate the value iteration algorithm by a smaller-dimensional iteration based on a representation on dictionaries of value functions. The…

Machine Learning · Computer Science 2019-06-21 Francis Bach

Near-Optimal Time and Sample Complexities for Solving Discounted Markov Decision Process with a Generative Model

In this paper we consider the problem of computing an $\epsilon$-optimal policy of a discounted Markov Decision Process (DMDP) provided we can only access its transition function through a generative sampling model that given any…

Optimization and Control · Mathematics 2019-06-07 Aaron Sidford , Mengdi Wang , Xian Wu , Lin F. Yang , Yinyu Ye

A First-Order Approach To Accelerated Value Iteration

Markov decision processes (MDPs) are used to model stochastic systems in many applications. Several efficient algorithms to compute optimal policies have been studied in the literature, including value iteration (VI) and policy iteration.…

Optimization and Control · Mathematics 2021-08-30 Vineet Goyal , Julien Grand-Clement

Acceleration Operators in the Value Iteration Algorithms for Markov Decision Processes

We study the general approach to accelerating the convergence of the most widely used solution method of Markov decision processes with the total expected discounted reward. Inspired by the monotone behavior of the contraction mappings in…

Optimization and Control · Mathematics 2008-03-28 Oleksandr Shlakhter , Chi-Guhn Lee , Dmitry Khmelev , Nasser Jaber

Approximate Value Iteration for Risk-aware Markov Decision Processes

We consider large-scale Markov decision processes (MDPs) with a risk measure of variability in cost, under the risk-aware MDPs paradigm. Previous studies showed that risk-aware MDPs, based on a minimax approach to handling risk, can be…

Systems and Control · Computer Science 2017-05-17 Pengqian Yu , William B. Haskell , Huan Xu

Model Reduction Techniques for Computing Approximately Optimal Solutions for Markov Decision Processes

We present a method for solving implicit (factored) Markov decision processes (MDPs) with very large state spaces. We introduce a property of state space partitions which we call epsilon-homogeneity. Intuitively, an epsilon-homogeneous…

Artificial Intelligence · Computer Science 2013-02-08 Thomas L. Dean , Robert Givan , Sonia Leach

Generalized Second Order Value Iteration in Markov Decision Processes

Value iteration is a fixed point iteration technique utilized to obtain the optimal value function and policy in a discounted reward Markov Decision Process (MDP). Here, a contraction operator is constructed and applied repeatedly to arrive…

Machine Learning · Computer Science 2021-09-21 Chandramouli Kamanchi , Raghuram Bharadwaj Diddigi , Shalabh Bhatnagar

Finite-State Approximations to Discounted and Average Cost Constrained Markov Decision Processes

In this paper, we consider the finite-state approximation of a discrete-time constrained Markov decision process (MDP) under the discounted and average cost criteria. Using the linear programming formulation of the constrained discounted…

Optimization and Control · Mathematics 2018-07-10 Naci Saldi

Factored Value Iteration Converges

In this paper we propose a novel algorithm, factored value iteration (FVI), for the approximate solution of factored Markov decision processes (fMDPs). The traditional approximate value iteration algorithm is modified in two ways. For one,…

Artificial Intelligence · Computer Science 2008-08-13 Istvan Szita , Andras Lorincz

Combinatorial Selection with Costly Information

We consider a class of optimization problems over stochastic variables where the algorithm can learn information about the value of any variable through a series of costly steps; we model this information acquisition process as a Markov…

Data Structures and Algorithms · Computer Science 2025-07-25 Shuchi Chawla , Dimitris Christou , Amit Harlev , Ziv Scully

A Method for Speeding Up Value Iteration in Partially Observable Markov Decision Processes

We present a technique for speeding up the convergence of value iteration for partially observable Markov decisions processes (POMDPs). The underlying idea is similar to that behind modified policy iteration for fully observable Markov…

Artificial Intelligence · Computer Science 2013-01-30 Nevin Lianwen Zhang , Stephen S. Lee , Weihong Zhang

Geometric Policy Iteration for Markov Decision Processes

Recently discovered polyhedral structures of the value function for finite state-action discounted Markov decision processes (MDP) shed light on understanding the success of reinforcement learning. We investigate the value function polytope…

Machine Learning · Computer Science 2022-06-27 Yue Wu , Jesús A. De Loera