Related papers: An Adaptive State Aggregation Algorithm for Markov…
This paper presents a way of solving Markov Decision Processes that combines state abstraction and temporal abstraction. Specifically, we combine state aggregation with the options framework and demonstrate that they work well together and…
In this paper we provide faster algorithms for approximately solving discounted Markov Decision Processes in multiple parameter regimes. Given a discounted Markov Decision Process (DMDP) with $|S|$ states, $|A|$ actions, discount factor…
Partially observable Markov decision processes (POMDPs) have recently become popular among many AI researchers because they serve as a natural model for planning under uncertainty. Value iteration is a well-known algorithm for finding…
Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Mean payoff (or long-run average reward) provides a mathematically elegant formalism to express performance related…
Value iteration is a commonly used and empirically competitive method in solving many Markov decision process problems. However, it is known that value iteration has only pseudo-polynomial complexity in general. We establish a somewhat…
Value iteration is a fundamental algorithm for solving Markov Decision Processes (MDPs). It computes the maximal $n$-step payoff by iterating $n$ times a recurrence equation which is naturally associated to the MDP. At the same time, value…
Value Iteration is a widely used algorithm for solving Markov Decision Processes (MDPs). While previous studies have extensively analyzed its convergence properties, they primarily focus on convergence with respect to the infinity norm. In…
We build on a recently introduced geometric interpretation of Markov Decision Processes (MDPs) to analyze classical MDP-solving algorithms: Value Iteration (VI) and Policy Iteration (PI). First, we develop a geometry-based analytical…
We consider deterministic Markov decision processes (MDPs) and apply max-plus algebra tools to approximate the value iteration algorithm by a smaller-dimensional iteration based on a representation on dictionaries of value functions. The…
In this paper we consider the problem of computing an $\epsilon$-optimal policy of a discounted Markov Decision Process (DMDP) provided we can only access its transition function through a generative sampling model that given any…
Markov decision processes (MDPs) are used to model stochastic systems in many applications. Several efficient algorithms to compute optimal policies have been studied in the literature, including value iteration (VI) and policy iteration.…
We study the general approach to accelerating the convergence of the most widely used solution method of Markov decision processes with the total expected discounted reward. Inspired by the monotone behavior of the contraction mappings in…
We consider large-scale Markov decision processes (MDPs) with a risk measure of variability in cost, under the risk-aware MDPs paradigm. Previous studies showed that risk-aware MDPs, based on a minimax approach to handling risk, can be…
We present a method for solving implicit (factored) Markov decision processes (MDPs) with very large state spaces. We introduce a property of state space partitions which we call epsilon-homogeneity. Intuitively, an epsilon-homogeneous…
Value iteration is a fixed point iteration technique utilized to obtain the optimal value function and policy in a discounted reward Markov Decision Process (MDP). Here, a contraction operator is constructed and applied repeatedly to arrive…
In this paper, we consider the finite-state approximation of a discrete-time constrained Markov decision process (MDP) under the discounted and average cost criteria. Using the linear programming formulation of the constrained discounted…
In this paper we propose a novel algorithm, factored value iteration (FVI), for the approximate solution of factored Markov decision processes (fMDPs). The traditional approximate value iteration algorithm is modified in two ways. For one,…
We consider a class of optimization problems over stochastic variables where the algorithm can learn information about the value of any variable through a series of costly steps; we model this information acquisition process as a Markov…
We present a technique for speeding up the convergence of value iteration for partially observable Markov decisions processes (POMDPs). The underlying idea is similar to that behind modified policy iteration for fully observable Markov…
Recently discovered polyhedral structures of the value function for finite state-action discounted Markov decision processes (MDP) shed light on understanding the success of reinforcement learning. We investigate the value function polytope…