Related papers: Performance guarantees for model-based Approximate…

Learning continuous Q-Functions using generalized Benders cuts

Q-functions are widely used in discrete-time learning and control to model future costs arising from a given control policy, when the initial state and input are given. Although some of their properties are understood, Q-functions…

Optimization and Control · Mathematics 2019-02-21 Joseph Warrington

Performance Guarantees for Data-Driven Sequential Decision-Making

The solutions to many sequential decision-making problems are characterized by dynamic programming and Bellman's principle of optimality. However, due to the inherent complexity of solving Bellman's equation exactly, there has been…

Systems and Control · Electrical Eng. & Systems 2026-03-24 Bowen Li , Edwin K. P. Chong , Ali Pezeshki

Global Optimization for Value Function Approximation

Existing value function approximation methods have been successfully used in many applications, but they often lack useful a priori error bounds. We propose a new approximate bilinear programming formulation of value function approximation,…

Artificial Intelligence · Computer Science 2010-06-15 Marek Petrik , Shlomo Zilberstein

Dynamic Programming Deconstructed: Transformations of the Bellman Equation and Computational Efficiency

Some approaches to solving challenging dynamic programming problems, such as Q-learning, begin by transforming the Bellman equation into an alternative functional equation, in order to open up a new line of attack. Our paper studies this…

Optimization and Control · Mathematics 2019-12-05 Qingyin Ma , John Stachurski

Regularized Q-Learning with Linear Function Approximation

Regularized Markov Decision Processes serve as models of sequential decision making under uncertainty wherein the decision maker has limited information processing capacity and/or aversion to model ambiguity. With functional approximation,…

Artificial Intelligence · Computer Science 2025-02-11 Jiachen Xi , Alfredo Garcia , Petar Momcilovic

Fitted Q-Learning for Relational Domains

We consider the problem of Approximate Dynamic Programming in relational domains. Inspired by the success of fitted Q-learning methods in propositional settings, we develop the first relational fitted Q-learning algorithms by representing…

Machine Learning · Computer Science 2020-06-11 Srijita Das , Sriraam Natarajan , Kaushik Roy , Ronald Parr , Kristian Kersting

Accelerated Point-wise Maximum Approach to Approximate Dynamic Programming

We describe an approximate dynamic programming approach to compute lower bounds on the optimal value function for a discrete time, continuous space, infinite horizon setting. The approach iteratively constructs a family of lower bounding…

Systems and Control · Electrical Eng. & Systems 2024-12-20 Paul N. Beuchat , Joseph Warrington , John Lygeros

Generalized Dual Dynamic Programming for Infinite Horizon Problems in Continuous State and Action Spaces

We describe a nonlinear generalization of dual dynamic programming theory and its application to value function estimation for deterministic control problems over continuous state and action spaces, in a discrete-time infinite horizon…

Optimization and Control · Mathematics 2018-10-05 Joseph Warrington , Paul N. Beuchat , John Lygeros

Increasing the Action Gap: New Operators for Reinforcement Learning

This paper introduces new optimality-preserving operators on Q-functions. We first describe an operator for tabular representations, the consistent Bellman operator, which incorporates a notion of local policy consistency. We show that this…

Artificial Intelligence · Computer Science 2015-12-16 Marc G. Bellemare , Georg Ostrovski , Arthur Guez , Philip S. Thomas , Rémi Munos

Gradient-Bounded Dynamic Programming for Submodular and Concave Extensible Value Functions with Probabilistic Performance Guarantees

We consider stochastic dynamic programming problems with high-dimensional, discrete state-spaces and finite, discrete-time horizons that prohibit direct computation of the value function from a given Bellman equation for all states and time…

Optimization and Control · Mathematics 2020-06-05 Denis Lebedev , Paul Goulart , Kostas Margellos

Data-Driven Optimal Control of Affine Systems: A Linear Programming Perspective

In this letter, we discuss the problem of optimal control for affine systems in the context of data-driven linear programming. First, we introduce a unified framework for the fixed point characterization of the value function, Q-function…

Systems and Control · Electrical Eng. & Systems 2022-07-12 Andrea Martinelli , Matilde Gargiani , Marina Draskovic , John Lygeros

Guaranteed Bounds for General Approximate Dynamic Programming

In this paper, we will develop a systematic approach to deriving guaranteed bounds for approximate dynamic programming (ADP) schemes in optimal control problems. Our approach is inspired by our recent results on bounding the performance of…

Optimization and Control · Mathematics 2014-03-31 Yajing Liu , Edwin K. P. Chong , Ali Pezeshki , Bill Moran

Convergence of Dynamic Programming on the Semidefinite Cone

The goal of this paper is to investigate new and simple convergence analysis of dynamic programming for linear quadratic regulator problem of discrete-time linear time-invariant systems. In particular, bounds on errors are given in terms of…

Optimization and Control · Mathematics 2021-06-18 Donghwan Lee

A Convex Optimization Approach to Dynamic Programming in Continuous State and Action Spaces

In this paper, a convex optimization-based method is proposed for numerically solving dynamic programs in continuous state and action spaces. The key idea is to approximate the output of the Bellman operator at a particular state by the…

Optimization and Control · Mathematics 2020-10-23 Insoon Yang

Linear Bellman Completeness Suffices for Efficient Online Reinforcement Learning with Few Actions

One of the most natural approaches to reinforcement learning (RL) with function approximation is value iteration, which inductively generates approximations to the optimal value function by solving a sequence of regression problems. To…

Machine Learning · Computer Science 2024-06-19 Noah Golowich , Ankur Moitra

Tight Performance Bounds for Approximate Modified Policy Iteration with Non-Stationary Policies

We consider approximate dynamic programming for the infinite-horizon stationary $\gamma$-discounted optimal control problem formalized by Markov Decision Processes. While in the exact case it is known that there always exists an optimal…

Optimization and Control · Mathematics 2013-04-23 Boris Lesner , Bruno Scherrer

Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage

In offline reinforcement learning (RL) we have no opportunity to explore so we must make assumptions that the data is sufficient to guide picking a good policy, taking the form of assuming some coverage, realizability, Bellman completeness,…

Machine Learning · Computer Science 2023-11-14 Masatoshi Uehara , Nathan Kallus , Jason D. Lee , Wen Sun

Approximate Dynamic Programming via Sum of Squares Programming

We describe an approximate dynamic programming method for stochastic control problems on infinite state and input spaces. The optimal value function is approximated by a linear combination of basis functions with coefficients as decision…

Optimization and Control · Mathematics 2012-12-07 Tyler H. Summers , Konstantin Kunz , Nikolaos Kariotoglou , Maryam Kamgarpour , Sean Summers , John Lygeros

Data-Driven Performance Guarantees for Parametric Optimization Problems

We propose a data-driven method to establish probabilistic performance guarantees for parametric optimization problems solved via iterative algorithms. Our approach addresses two key challenges: providing convergence guarantees to…

Optimization and Control · Mathematics 2025-10-31 Jingyi Huang , Paul Goulart , Kostas Margellos

Data-driven optimal control with a relaxed linear program

The linear programming (LP) approach has a long history in the theory of approximate dynamic programming. When it comes to computation, however, the LP approach often suffers from poor scalability. In this work, we introduce a relaxed…

Systems and Control · Electrical Eng. & Systems 2020-12-01 Andrea Martinelli , Matilde Gargiani , John Lygeros