Related papers: Analyzing Approximate Value Iteration Algorithms
Approximate value iteration (AVI) is a family of algorithms for reinforcement learning (RL) that aims to obtain an approximation of the optimal value function. Generally, AVI algorithms implement an iterated procedure where each step…
In this paper, we study the theoretical properties of the projected Bellman equation (PBE) and two algorithms to solve this equation: linear Q-learning and approximate value iteration (AVI). We consider two sufficient conditions for the…
Value iteration (VI) is a ubiquitous algorithm for optimal control, planning, and reinforcement learning schemes. Under the right assumptions, VI is a vital tool to generate inputs with desirable properties for the controlled system, like…
This paper studies value iteration for infinite horizon contracting Markov decision processes under convexity assumptions and when the state space is uncountable. The original value iteration is replaced with a more tractable form and the…
Asynchronous stochastic approximations (SAs) are an important class of model-free algorithms, tools and techniques that are popular in multi-agent and distributed control scenarios. To counter Bellman's curse of dimensionality, such…
Two standard models for probabilistic systems are Markov chains (MCs) and Markov decision processes (MDPs). Classic objectives for such probabilistic models for control and planning problems are reachability and stochastic shortest path.…
This paper considers variational inequalities (VI) defined by the conditional value-at-risk (CVaR) of uncertain functions and provides three stochastic approximation schemes to solve them. All methods use an empirical estimate of the CVaR…
Value Iteration (VI) is foundational to the theory and practice of modern reinforcement learning, and it is known to converge at a $\mathcal{O}(\gamma^k)$-rate, where $\gamma$ is the discount factor. Surprisingly, however, the optimal rate…
While there is an extensive body of research on the analysis of Value Iteration (VI) for discounted cumulative-reward MDPs, prior work on analyzing VI for (undiscounted) average-reward MDPs has been limited, and most prior results focus on…
We propose universal randomized function approximation-based empirical value iteration (EVI) algorithms for Markov decision processes. The `empirical' nature comes from each iteration being done empirically from samples available from…
Existing value function approximation methods have been successfully used in many applications, but they often lack useful a priori error bounds. We propose a new approximate bilinear programming formulation of value function approximation,…
The stochastic volatility inspired (SVI) model is widely used to fit the implied variance smile. Presently, most optimizer algorithms for the SVI model have a strong dependence on the input starting point. In this study, we develop an…
Simple stochastic games can be solved by value iteration (VI), which yields a sequence of under-approximations of the value of the game. This sequence is guaranteed to converge to the value only in the limit. Since no stopping criterion is…
Multi-time-scale stochastic approximation is an iterative algorithm for finding the fixed point of a set of $N$ coupled operators given their noisy samples. It has been observed that due to the coupling between the decision variables and…
Markov decision processes (MDPs) are used to model stochastic systems in many applications. Several efficient algorithms to compute optimal policies have been studied in the literature, including value iteration (VI) and policy iteration.…
In this paper we propose a novel algorithm, factored value iteration (FVI), for the approximate solution of factored Markov decision processes (fMDPs). The traditional approximate value iteration algorithm is modified in two ways. For one,…
Approximate dynamic programming algorithms, such as approximate value iteration, have been successfully applied to many complex reinforcement learning tasks, and a better approximate dynamic programming algorithm is expected to further…
We propose empirical dynamic programming algorithms for Markov decision processes (MDPs). In these algorithms, the exact expectation in the Bellman operator in classical value iteration is replaced by an empirical estimate to get `empirical…
Adaptive optimal control using value iteration (VI) initiated from a stabilizing policy is theoretically analyzed in various aspects including the continuity of the result, the stability of the system operated using any single/constant…
We consider a framework for the construction of iterative schemes for operator equations that combine low-rank approximation in tensor formats and adaptive approximation in a basis. Under fairly general assumptions, we obtain a rigorous…