Related papers: Faster Fixed-Point Methods for Multichain MDPs
While there is an extensive body of research on the analysis of Value Iteration (VI) for discounted cumulative-reward MDPs, prior work on analyzing VI for (undiscounted) average-reward MDPs has been limited, and most prior results focus on…
Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Long-run average rewards provide a mathematically elegant formalism for expressing long term performance. Value iteration (VI)…
Two standard models for probabilistic systems are Markov chains (MCs) and Markov decision processes (MDPs). Classic objectives for such probabilistic models for control and planning problems are reachability and stochastic shortest path.…
In robust Markov decision processes (MDPs), the uncertainty in the transition kernel is addressed by finding a policy that optimizes the worst-case performance over an uncertainty set of MDPs. While much of the literature has focused on…
Learning and optimal control under robust Markov decision processes (MDPs) have received increasing attention, yet most existing theory, algorithms, and applications focus on finite-horizon or discounted models. Long-run average-reward…
Markov decision processes (MDPs) are used to model stochastic systems in many applications. Several efficient algorithms to compute optimal policies have been studied in the literature, including value iteration (VI) and policy iteration.…
One of the most widely used methods for solving average cost MDP problems is the value iteration method. This method, however, is often computationally impractical and restricted in size of solvable MDP problems. We propose acceleration…
We study the general approach to accelerating the convergence of the most widely used solution method of Markov decision processes with the total expected discounted reward. Inspired by the monotone behavior of the contraction mappings in…
We build on a recently introduced geometric interpretation of Markov Decision Processes (MDPs) to analyze classical MDP-solving algorithms: Value Iteration (VI) and Policy Iteration (PI). First, we develop a geometry-based analytical…
In this paper we provide faster algorithms for approximately solving discounted Markov Decision Processes in multiple parameter regimes. Given a discounted Markov Decision Process (DMDP) with $|S|$ states, $|A|$ actions, discount factor…
A classic solution technique for Markov decision processes (MDP) and stochastic games (SG) is value iteration (VI). Due to its good practical performance, this approximative approach is typically preferred over exact techniques, even though…
Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Mean payoff (or long-run average reward) provides a mathematically elegant formalism to express performance related…
Value iteration is a fixed point iteration technique utilized to obtain the optimal value function and policy in a discounted reward Markov Decision Process (MDP). Here, a contraction operator is constructed and applied repeatedly to arrive…
This paper studies the risk-averse mean-variance optimization in infinite-horizon discounted Markov decision processes (MDPs). The involved variance metric concerns reward variability during the whole process, and future deviations are…
Processes (MDPs) often require frequent decision making, that is, taking an action every microsecond, second, or minute. Infinite horizon discount reward formulation is still relevant for a large portion of these applications, because…
In this paper, we study a mean-variance optimization problem in an infinite horizon discrete time discounted Markov decision process (MDP). The objective is to minimize the variance of system rewards with the constraint of mean performance.…
This paper studies convergence properties of optimal values and actions for discounted and average-cost Markov Decision Processes (MDPs) with weakly continuous transition probabilities and applies these properties to the stochastic…
This paper proposes a computationally tractable algorithm for learning infinite-horizon average-reward linear mixture Markov decision processes (MDPs) under the Bellman optimality condition. Our algorithm for linear mixture MDPs achieves a…
Value iteration is a well-known method of solving Markov Decision Processes (MDPs) that is simple to implement and boasts strong theoretical convergence guarantees. However, the computational cost of value iteration quickly becomes…
We propose a general framework for entropy-regularized average-reward reinforcement learning in Markov decision processes (MDPs). Our approach is based on extending the linear-programming formulation of policy optimization in MDPs to…