Related papers: Faster Fixed-Point Methods for Multichain MDPs

Optimal Non-Asymptotic Rates of Value Iteration for Average-Reward Markov Decision Processes

While there is an extensive body of research on the analysis of Value Iteration (VI) for discounted cumulative-reward MDPs, prior work on analyzing VI for (undiscounted) average-reward MDPs has been limited, and most prior results focus on…

Optimization and Control · Mathematics 2026-02-10 Jongmin Lee , Ernest K. Ryu

Value Iteration for Long-run Average Reward in Markov Decision Processes

Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Long-run average rewards provide a mathematically elegant formalism for expressing long term performance. Value iteration (VI)…

Systems and Control · Computer Science 2017-09-01 Pranav Ashok , Krishnendu Chatterjee , Przemyslaw Daca , Jan Křetínský , Tobias Meggendorfer

Value Iteration with Guessing for Markov Chains and Markov Decision Processes

Two standard models for probabilistic systems are Markov chains (MCs) and Markov decision processes (MDPs). Classic objectives for such probabilistic models for control and planning problems are reachability and stochastic shortest path.…

Artificial Intelligence · Computer Science 2025-05-13 Krishnendu Chatterjee , Mahdi JafariRaviz , Raimundo Saona , Jakub Svoboda

Robust Average-Reward Markov Decision Processes

In robust Markov decision processes (MDPs), the uncertainty in the transition kernel is addressed by finding a policy that optimizes the worst-case performance over an uncertainty set of MDPs. While much of the literature has focused on…

Machine Learning · Computer Science 2023-03-02 Yue Wang , Alvaro Velasquez , George Atia , Ashley Prater-Bennette , Shaofeng Zou

Bellman Optimality of Average-Reward Robust Markov Decision Processes with a Constant Gain

Learning and optimal control under robust Markov decision processes (MDPs) have received increasing attention, yet most existing theory, algorithms, and applications focus on finite-horizon or discounted models. Long-run average-reward…

Optimization and Control · Mathematics 2025-12-12 Shengbo Wang , Nian Si

A First-Order Approach To Accelerated Value Iteration

Markov decision processes (MDPs) are used to model stochastic systems in many applications. Several efficient algorithms to compute optimal policies have been studied in the literature, including value iteration (VI) and policy iteration.…

Optimization and Control · Mathematics 2021-08-30 Vineet Goyal , Julien Grand-Clement

Acceleration Operators in the Value Iteration Algorithms for Average Reward Markov Decision Processes

One of the most widely used methods for solving average cost MDP problems is the value iteration method. This method, however, is often computationally impractical and restricted in size of solvable MDP problems. We propose acceleration…

Optimization and Control · Mathematics 2008-06-03 Oleksandr Shlakhter , Chi-Guhn Lee

Acceleration Operators in the Value Iteration Algorithms for Markov Decision Processes

We study the general approach to accelerating the convergence of the most widely used solution method of Markov decision processes with the total expected discounted reward. Inspired by the monotone behavior of the contraction mappings in…

Optimization and Control · Mathematics 2008-03-28 Oleksandr Shlakhter , Chi-Guhn Lee , Dmitry Khmelev , Nasser Jaber

Geometric Re-Analysis of Classical MDP Solving Algorithms

We build on a recently introduced geometric interpretation of Markov Decision Processes (MDPs) to analyze classical MDP-solving algorithms: Value Iteration (VI) and Policy Iteration (PI). First, we develop a geometry-based analytical…

Machine Learning · Computer Science 2025-03-07 Arsenii Mustafin , Aleksei Pakharev , Alex Olshevsky , Ioannis Ch. Paschalidis

Variance Reduced Value Iteration and Faster Algorithms for Solving Markov Decision Processes

In this paper we provide faster algorithms for approximately solving discounted Markov Decision Processes in multiple parameter regimes. Given a discounted Markov Decision Process (DMDP) with $|S|$ states, $|A|$ actions, discount factor…

Data Structures and Algorithms · Computer Science 2020-12-24 Aaron Sidford , Mengdi Wang , Xian Wu , Yinyu Ye

Stopping Criteria for Value Iteration on Stochastic Games with Quantitative Objectives

A classic solution technique for Markov decision processes (MDP) and stochastic games (SG) is value iteration (VI). Due to its good practical performance, this approximative approach is typically preferred over exact techniques, even though…

Artificial Intelligence · Computer Science 2023-04-21 Jan Křetínský , Tobias Meggendorfer , Maximilian Weininger

Efficient Strategy Iteration for Mean Payoff in Markov Decision Processes

Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Mean payoff (or long-run average reward) provides a mathematically elegant formalism to express performance related…

Performance · Computer Science 2017-09-08 Jan Křetínský , Tobias Meggendorfer

Generalized Second Order Value Iteration in Markov Decision Processes

Value iteration is a fixed point iteration technique utilized to obtain the optimal value function and policy in a discounted reward Markov Decision Process (MDP). Here, a contraction operator is constructed and applied repeatedly to arrive…

Machine Learning · Computer Science 2021-09-21 Chandramouli Kamanchi , Raghuram Bharadwaj Diddigi , Shalabh Bhatnagar

A unified algorithm framework for mean-variance optimization in discounted Markov decision processes

This paper studies the risk-averse mean-variance optimization in infinite-horizon discounted Markov decision processes (MDPs). The involved variance metric concerns reward variability during the whole process, and future deviations are…

Optimization and Control · Mathematics 2022-01-19 Shuai Ma , Xiaoteng Ma , Li Xia

Weighted Difference Approximation of Value Functions for Slow-Discounting Markov Decision Processes

Processes (MDPs) often require frequent decision making, that is, taking an action every microsecond, second, or minute. Infinite horizon discount reward formulation is still relevant for a large portion of these applications, because…

Optimization and Control · Mathematics 2014-12-17 Yin-Lam Chow , Junjie Qin

Mean-Variance Optimization of Discrete Time Discounted Markov Decision Processes

In this paper, we study a mean-variance optimization problem in an infinite horizon discrete time discounted Markov decision process (MDP). The objective is to minimize the variance of system rewards with the constraint of mean performance.…

Optimization and Control · Mathematics 2017-08-24 Li Xia

On the Convergence of Optimal Actions for Markov Decision Processes and the Optimality of $(s,S)$ Inventory Policies

This paper studies convergence properties of optimal values and actions for discounted and average-cost Markov Decision Processes (MDPs) with weakly continuous transition probabilities and applies these properties to the stochastic…

Optimization and Control · Mathematics 2017-03-21 Eugene A. Feinberg , Mark E. Lewis

Learning Infinite-Horizon Average-Reward Linear Mixture MDPs of Bounded Span

This paper proposes a computationally tractable algorithm for learning infinite-horizon average-reward linear mixture Markov decision processes (MDPs) under the Bellman optimality condition. Our algorithm for linear mixture MDPs achieves a…

Machine Learning · Computer Science 2024-10-22 Woojin Chae , Kihyuk Hong , Yufan Zhang , Ambuj Tewari , Dabeen Lee

An Adaptive State Aggregation Algorithm for Markov Decision Processes

Value iteration is a well-known method of solving Markov Decision Processes (MDPs) that is simple to implement and boasts strong theoretical convergence guarantees. However, the computational cost of value iteration quickly becomes…

Machine Learning · Computer Science 2021-07-26 Guanting Chen , Johann Demetrio Gaebler , Matt Peng , Chunlin Sun , Yinyu Ye

A unified view of entropy-regularized Markov decision processes

We propose a general framework for entropy-regularized average-reward reinforcement learning in Markov decision processes (MDPs). Our approach is based on extending the linear-programming formulation of policy optimization in MDPs to…

Machine Learning · Computer Science 2017-05-23 Gergely Neu , Anders Jonsson , Vicenç Gómez