Related papers: Optimistic Value Iteration
Computing reachability probabilities is at the heart of probabilistic model checking. All model checkers compute these probabilities in an iterative fashion using value iteration. This technique approximates a fixed point from below by…
Solving partially observable Markov decision processes (POMDPs) typically requires reasoning about the values of exponentially many state beliefs. Towards practical performance, state-of-the-art solvers use value bounds to guide this…
Algorithmic analysis of Markov decision processes (MDP) and stochastic games (SG) in practice relies on value-iteration (VI) algorithms. Since basic VI does not provide guarantees on the precision of the result, variants of VI have been…
Algorithmic analysis of Markov decision processes (MDP) and stochastic games (SG) in practice relies on value-iteration (VI) algorithms. Since the basic version of VI does not provide guarantees on the precision of the result, variants of…
Planning problems where effects of actions are non-deterministic can be modeled as Markov decision processes. Planning problems are usually goal-directed. This paper proposes several techniques for exploiting the goal-directedness to…
Partially observable Markov decision processes (POMDPs) have recently become popular among many AI researchers because they serve as a natural model for planning under uncertainty. Value iteration is a well-known algorithm for finding…
Standard Markov decision process (MDP) and reinforcement learning algorithms optimize the policy with respect to the expected gain. We propose an algorithm which enables to optimize an alternative objective: the probability that the gain is…
We study algorithms using randomized value functions for exploration in reinforcement learning. This type of algorithms enjoys appealing empirical performance. We show that when we use 1) a single random seed in each episode, and 2) a…
Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Mean payoff (or long-run average reward) provides a mathematically elegant formalism to express performance related…
Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Long-run average rewards provide a mathematically elegant formalism for expressing long term performance. Value iteration (VI)…
Markov decision processes are useful models of concurrency optimisation problems, but are often intractable for exhaustive verification methods. Recent work has introduced lightweight approximative techniques that sample directly from…
We propose a new method for optimistic planning in infinite-horizon discounted Markov decision processes based on the idea of adding regularization to the updates of an otherwise standard approximate value iteration procedure. This…
Value iteration is a fixed point iteration technique utilized to obtain the optimal value function and policy in a discounted reward Markov Decision Process (MDP). Here, a contraction operator is constructed and applied repeatedly to arrive…
While value iteration (VI) is a standard solution approach to simple stochastic games (SSGs), it suffered from the lack of a stopping criterion. Recently, several solutions have appeared, among them also "optimistic" VI (OVI). However, OVI…
Value Iteration is a widely used algorithm for solving Markov Decision Processes (MDPs). While previous studies have extensively analyzed its convergence properties, they primarily focus on convergence with respect to the infinity norm. In…
Constrained decision-making is essential for designing safe policies in real-world control systems, yet simulated environments often fail to capture real-world adversities. We consider the problem of learning a policy that will maximize the…
Value iteration is a popular algorithm for finding near optimal policies for POMDPs. It is inefficient due to the need to account for the entire belief space, which necessitates the solution of large numbers of linear programs. In this…
We study policy iteration for infinite-horizon Markov decision processes. It has recently been shown policy iteration style algorithms have exponential lower bounds in a two player game setting. We extend these lower bounds to Markov…
We tackle the issue of finding a good policy when the number of policy updates is limited. This is done by approximating the expected policy reward as a sequence of concave lower bounds which can be efficiently maximized, drastically…
This note provides upper bounds on the number of operations required to compute by value iterations a nearly optimal policy for an infinite-horizon discounted Markov decision process with a finite number of states and actions. For a given…