Related papers: Performance guarantees for model-based Approximate…
Q-functions are widely used in discrete-time learning and control to model future costs arising from a given control policy, when the initial state and input are given. Although some of their properties are understood, Q-functions…
The solutions to many sequential decision-making problems are characterized by dynamic programming and Bellman's principle of optimality. However, due to the inherent complexity of solving Bellman's equation exactly, there has been…
Existing value function approximation methods have been successfully used in many applications, but they often lack useful a priori error bounds. We propose a new approximate bilinear programming formulation of value function approximation,…
Some approaches to solving challenging dynamic programming problems, such as Q-learning, begin by transforming the Bellman equation into an alternative functional equation, in order to open up a new line of attack. Our paper studies this…
Regularized Markov Decision Processes serve as models of sequential decision making under uncertainty wherein the decision maker has limited information processing capacity and/or aversion to model ambiguity. With functional approximation,…
We consider the problem of Approximate Dynamic Programming in relational domains. Inspired by the success of fitted Q-learning methods in propositional settings, we develop the first relational fitted Q-learning algorithms by representing…
We describe an approximate dynamic programming approach to compute lower bounds on the optimal value function for a discrete time, continuous space, infinite horizon setting. The approach iteratively constructs a family of lower bounding…
We describe a nonlinear generalization of dual dynamic programming theory and its application to value function estimation for deterministic control problems over continuous state and action spaces, in a discrete-time infinite horizon…
This paper introduces new optimality-preserving operators on Q-functions. We first describe an operator for tabular representations, the consistent Bellman operator, which incorporates a notion of local policy consistency. We show that this…
We consider stochastic dynamic programming problems with high-dimensional, discrete state-spaces and finite, discrete-time horizons that prohibit direct computation of the value function from a given Bellman equation for all states and time…
In this letter, we discuss the problem of optimal control for affine systems in the context of data-driven linear programming. First, we introduce a unified framework for the fixed point characterization of the value function, Q-function…
In this paper, we will develop a systematic approach to deriving guaranteed bounds for approximate dynamic programming (ADP) schemes in optimal control problems. Our approach is inspired by our recent results on bounding the performance of…
The goal of this paper is to investigate new and simple convergence analysis of dynamic programming for linear quadratic regulator problem of discrete-time linear time-invariant systems. In particular, bounds on errors are given in terms of…
In this paper, a convex optimization-based method is proposed for numerically solving dynamic programs in continuous state and action spaces. The key idea is to approximate the output of the Bellman operator at a particular state by the…
One of the most natural approaches to reinforcement learning (RL) with function approximation is value iteration, which inductively generates approximations to the optimal value function by solving a sequence of regression problems. To…
We consider approximate dynamic programming for the infinite-horizon stationary $\gamma$-discounted optimal control problem formalized by Markov Decision Processes. While in the exact case it is known that there always exists an optimal…
In offline reinforcement learning (RL) we have no opportunity to explore so we must make assumptions that the data is sufficient to guide picking a good policy, taking the form of assuming some coverage, realizability, Bellman completeness,…
We describe an approximate dynamic programming method for stochastic control problems on infinite state and input spaces. The optimal value function is approximated by a linear combination of basis functions with coefficients as decision…
We propose a data-driven method to establish probabilistic performance guarantees for parametric optimization problems solved via iterative algorithms. Our approach addresses two key challenges: providing convergence guarantees to…
The linear programming (LP) approach has a long history in the theory of approximate dynamic programming. When it comes to computation, however, the LP approach often suffers from poor scalability. In this work, we introduce a relaxed…