Related papers: Analyzing Approximate Value Iteration Algorithms

Parameterized Projected Bellman Operator

Approximate value iteration (AVI) is a family of algorithms for reinforcement learning (RL) that aims to obtain an approximation of the optimal value function. Generally, AVI algorithms implement an iterated procedure where each step…

Machine Learning · Computer Science 2024-03-07 Théo Vincent , Alberto Maria Metelli , Boris Belousov , Jan Peters , Marcello Restelli , Carlo D'Eramo

Understanding the theoretical properties of projected Bellman equation, linear Q-learning, and approximate value iteration

In this paper, we study the theoretical properties of the projected Bellman equation (PBE) and two algorithms to solve this equation: linear Q-learning and approximate value iteration (AVI). We consider two sufficient conditions for the…

Artificial Intelligence · Computer Science 2025-04-16 Han-Dong Lim , Donghwan Lee

When to stop value iteration: stability and near-optimality versus computation

Value iteration (VI) is a ubiquitous algorithm for optimal control, planning, and reinforcement learning schemes. Under the right assumptions, VI is a vital tool to generate inputs with desirable properties for the controlled system, like…

Optimization and Control · Mathematics 2020-11-23 Mathieu Granzotto , Romain Postoyan , Dragan Nešić , Lucian Buşoniu , Jamal Daafouz

Value iteration for approximate dynamic programming under convexity

This paper studies value iteration for infinite horizon contracting Markov decision processes under convexity assumptions and when the state space is uncountable. The original value iteration is replaced with a more tractable form and the…

Optimization and Control · Mathematics 2018-02-21 Jeremy Yee

Asynchronous stochastic approximations with asymptotically biased errors and deep multi-agent learning

Asynchronous stochastic approximations (SAs) are an important class of model-free algorithms, tools and techniques that are popular in multi-agent and distributed control scenarios. To counter Bellman's curse of dimensionality, such…

Optimization and Control · Mathematics 2019-05-03 Arunselvan Ramaswamy , Shalabh Bhatnagar , Daniel E. Quevedo

Value Iteration with Guessing for Markov Chains and Markov Decision Processes

Two standard models for probabilistic systems are Markov chains (MCs) and Markov decision processes (MDPs). Classic objectives for such probabilistic models for control and planning problems are reachability and stochastic shortest path.…

Artificial Intelligence · Computer Science 2025-05-13 Krishnendu Chatterjee , Mahdi JafariRaviz , Raimundo Saona , Jakub Svoboda

Stochastic approximation approaches for CVaR-based variational inequalities

This paper considers variational inequalities (VI) defined by the conditional value-at-risk (CVaR) of uncertain functions and provides three stochastic approximation schemes to solve them. All methods use an empirical estimate of the CVaR…

Optimization and Control · Mathematics 2022-11-16 Jasper Verbree , Ashish Cherukuri

Accelerating Value Iteration with Anchoring

Value Iteration (VI) is foundational to the theory and practice of modern reinforcement learning, and it is known to converge at a $\mathcal{O}(\gamma^k)$-rate, where $\gamma$ is the discount factor. Surprisingly, however, the optimal rate…

Machine Learning · Computer Science 2023-10-31 Jongmin Lee , Ernest K. Ryu

Optimal Non-Asymptotic Rates of Value Iteration for Average-Reward Markov Decision Processes

While there is an extensive body of research on the analysis of Value Iteration (VI) for discounted cumulative-reward MDPs, prior work on analyzing VI for (undiscounted) average-reward MDPs has been limited, and most prior results focus on…

Optimization and Control · Mathematics 2026-02-10 Jongmin Lee , Ernest K. Ryu

An Empirical Dynamic Programming Algorithm for Continuous MDPs

We propose universal randomized function approximation-based empirical value iteration (EVI) algorithms for Markov decision processes. The `empirical' nature comes from each iteration being done empirically from samples available from…

Optimization and Control · Mathematics 2019-04-25 William B. Haskell , Rahul Jain , Hiteshi Sharma , Pengqian Yu

Global Optimization for Value Function Approximation

Existing value function approximation methods have been successfully used in many applications, but they often lack useful a priori error bounds. We propose a new approximate bilinear programming formulation of value function approximation,…

Artificial Intelligence · Computer Science 2010-06-15 Marek Petrik , Shlomo Zilberstein

Fixed-point iterative algorithm for SVI model

The stochastic volatility inspired (SVI) model is widely used to fit the implied variance smile. Presently, most optimizer algorithms for the SVI model have a strong dependence on the input starting point. In this study, we develop an…

Mathematical Finance · Quantitative Finance 2023-01-20 Shuzhen Yang , Wenqing Zhang

Value Iteration for Simple Stochastic Games: Stopping Criterion and Learning Algorithm

Simple stochastic games can be solved by value iteration (VI), which yields a sequence of under-approximations of the value of the game. This sequence is guaranteed to converge to the value only in the limit. Since no stopping criterion is…

Logic in Computer Science · Computer Science 2021-02-02 Edon Kelmendi , Julia Krämer , Jan Kretinsky , Maximilian Weininger

Accelerated Multi-Time-Scale Stochastic Approximation: Optimal Complexity and Applications in Reinforcement Learning and Multi-Agent Games

Multi-time-scale stochastic approximation is an iterative algorithm for finding the fixed point of a set of $N$ coupled operators given their noisy samples. It has been observed that due to the coupling between the decision variables and…

Optimization and Control · Mathematics 2024-09-13 Sihan Zeng , Thinh T. Doan

A First-Order Approach To Accelerated Value Iteration

Markov decision processes (MDPs) are used to model stochastic systems in many applications. Several efficient algorithms to compute optimal policies have been studied in the literature, including value iteration (VI) and policy iteration.…

Optimization and Control · Mathematics 2021-08-30 Vineet Goyal , Julien Grand-Clement

Factored Value Iteration Converges

In this paper we propose a novel algorithm, factored value iteration (FVI), for the approximate solution of factored Markov decision processes (fMDPs). The traditional approximate value iteration algorithm is modified in two ways. For one,…

Artificial Intelligence · Computer Science 2008-08-13 Istvan Szita , Andras Lorincz

Unifying Value Iteration, Advantage Learning, and Dynamic Policy Programming

Approximate dynamic programming algorithms, such as approximate value iteration, have been successfully applied to many complex reinforcement learning tasks, and a better approximate dynamic programming algorithm is expected to further…

Machine Learning · Statistics 2017-10-31 Tadashi Kozuno , Eiji Uchibe , Kenji Doya

Empirical Dynamic Programming

We propose empirical dynamic programming algorithms for Markov decision processes (MDPs). In these algorithms, the exact expectation in the Bellman operator in classical value iteration is replaced by an empirical estimate to get `empirical…

Optimization and Control · Mathematics 2013-11-26 William B. Haskell , Rahul Jain , Dileep Kalathil

Stabilizing Value Iteration with and without Approximation Errors

Adaptive optimal control using value iteration (VI) initiated from a stabilizing policy is theoretically analyzed in various aspects including the continuity of the result, the stability of the system operated using any single/constant…

Systems and Control · Computer Science 2015-05-18 Ali Heydari

Adaptive Near-Optimal Rank Tensor Approximation for High-Dimensional Operator Equations

We consider a framework for the construction of iterative schemes for operator equations that combine low-rank approximation in tensor formats and adaptive approximation in a basis. Under fairly general assumptions, we obtain a rigorous…

Numerical Analysis · Mathematics 2014-03-17 Markus Bachmayr , Wolfgang Dahmen