Related papers: Parameterized Projected Bellman Operator

Analyzing Approximate Value Iteration Algorithms

In this paper, we consider the stochastic iterative counterpart of the value iteration scheme wherein only noisy and possibly biased approximations of the Bellman operator are available. We call this counterpart as the approximate value…

Systems and Control · Computer Science 2021-06-01 Arunselvan Ramaswamy , Shalabh Bhatnagar

Understanding the theoretical properties of projected Bellman equation, linear Q-learning, and approximate value iteration

In this paper, we study the theoretical properties of the projected Bellman equation (PBE) and two algorithms to solve this equation: linear Q-learning and approximate value iteration (AVI). We consider two sufficient conditions for the…

Artificial Intelligence · Computer Science 2025-04-16 Han-Dong Lim , Donghwan Lee

Near-optimal Offline Reinforcement Learning with Linear Representation: Leveraging Variance Information with Pessimism

Offline reinforcement learning, which seeks to utilize offline/historical data to optimize sequential decision-making strategies, has gained surging prominence in recent studies. Due to the advantage that appropriate function approximators…

Machine Learning · Computer Science 2022-03-14 Ming Yin , Yaqi Duan , Mengdi Wang , Yu-Xiang Wang

Gradual Transition from Bellman Optimality Operator to Bellman Operator in Online Reinforcement Learning

For continuous action spaces, actor-critic methods are widely used in online reinforcement learning (RL). However, unlike RL algorithms for discrete actions, which generally model the optimal value function using the Bellman optimality…

Machine Learning · Computer Science 2025-08-14 Motoki Omura , Kazuki Ota , Takayuki Osa , Yusuke Mukuta , Tatsuya Harada

Kalman meets Bellman: Improving Policy Evaluation through Value Tracking

Policy evaluation is a key process in Reinforcement Learning (RL). It assesses a given policy by estimating the corresponding value function. When using parameterized value functions, common approaches minimize the sum of squared Bellman…

Machine Learning · Computer Science 2020-02-19 Shirli Di-Castro Shashua , Shie Mannor

PABBO: Preferential Amortized Black-Box Optimization

Preferential Bayesian Optimization (PBO) is a sample-efficient method to learn latent user utilities from preferential feedback over a pair of designs. It relies on a statistical surrogate model for the latent function, usually a Gaussian…

Machine Learning · Statistics 2025-03-04 Xinyu Zhang , Daolang Huang , Samuel Kaski , Julien Martinelli

Incremental Sampling-based Motion Planners Using Policy Iteration Methods

Recent progress in randomized motion planners has led to the development of a new class of sampling-based algorithms that provide asymptotic optimality guarantees, notably the RRT* and the PRM* algorithms. Careful analysis reveals that the…

Robotics · Computer Science 2016-09-21 Oktay Arslan , Panagiotis Tsiotras

Bayesian Bellman Operators

We introduce a novel perspective on Bayesian reinforcement learning (RL); whereas existing approaches infer a posterior over the transition distribution or Q-function, we characterise the uncertainty in the Bellman operator. Our Bayesian…

Machine Learning · Computer Science 2021-06-17 Matthew Fellows , Kristian Hartikainen , Shimon Whiteson

Bayesian Optimization for Iterative Learning

The performance of deep (reinforcement) learning systems crucially depends on the choice of hyperparameters. Their tuning is notoriously expensive, typically requiring an iterative training process to run for numerous steps to convergence.…

Machine Learning · Computer Science 2021-01-19 Vu Nguyen , Sebastian Schulze , Michael A Osborne

An Empirical Dynamic Programming Algorithm for Continuous MDPs

We propose universal randomized function approximation-based empirical value iteration (EVI) algorithms for Markov decision processes. The `empirical' nature comes from each iteration being done empirically from samples available from…

Optimization and Control · Mathematics 2019-04-25 William B. Haskell , Rahul Jain , Hiteshi Sharma , Pengqian Yu

Optimising Call Centre Operations using Reinforcement Learning: Value Iteration versus Proximal Policy Optimisation

This paper investigates the application of Reinforcement Learning (RL) to optimise call routing in call centres to minimise client waiting time and staff idle time. Two methods are compared: a model-based approach using Value Iteration (VI)…

Artificial Intelligence · Computer Science 2025-07-25 Kwong Ho Li , Wathsala Karunarathne

Deep Primal-Dual Reinforcement Learning: Accelerating Actor-Critic using Bellman Duality

We develop a parameterized Primal-Dual $\pi$ Learning method based on deep neural networks for Markov decision process with large state space and off-policy reinforcement learning. In contrast to the popular Q-learning and actor-critic…

Machine Learning · Computer Science 2017-12-08 Woon Sang Cho , Mengdi Wang

Empirical Dynamic Programming

We propose empirical dynamic programming algorithms for Markov decision processes (MDPs). In these algorithms, the exact expectation in the Bellman operator in classical value iteration is replaced by an empirical estimate to get `empirical…

Optimization and Control · Mathematics 2013-11-26 William B. Haskell , Rahul Jain , Dileep Kalathil

Linear Bellman Completeness Suffices for Efficient Online Reinforcement Learning with Few Actions

One of the most natural approaches to reinforcement learning (RL) with function approximation is value iteration, which inductively generates approximations to the optimal value function by solving a sequence of regression problems. To…

Machine Learning · Computer Science 2024-06-19 Noah Golowich , Ankur Moitra

KIPPO: Koopman-Inspired Proximal Policy Optimization

Reinforcement Learning (RL) has made significant strides in various domains, and policy gradient methods like Proximal Policy Optimization (PPO) have gained popularity due to their balance in performance, training stability, and…

Machine Learning · Computer Science 2025-05-21 Andrei Cozma , Landon Harris , Hairong Qi

Bellman Calibration for $V$-Learning in Offline Reinforcement Learning

Reliable long-horizon value prediction is difficult in offline reinforcement learning because fitted value methods combine bootstrapping, function approximation, and distribution shift, while standard guarantees often require Bellman…

Machine Learning · Statistics 2026-05-11 Lars van der Laan , Nathan Kallus

Value Iteration with Guessing for Markov Chains and Markov Decision Processes

Two standard models for probabilistic systems are Markov chains (MCs) and Markov decision processes (MDPs). Classic objectives for such probabilistic models for control and planning problems are reachability and stochastic shortest path.…

Artificial Intelligence · Computer Science 2025-05-13 Krishnendu Chatterjee , Mahdi JafariRaviz , Raimundo Saona , Jakub Svoboda

Learning Near Optimal Policies with Low Inherent Bellman Error

We study the exploration problem with approximate linear action-value functions in episodic reinforcement learning under the notion of low inherent Bellman error, a condition normally employed to show convergence of approximate value…

Machine Learning · Computer Science 2020-06-30 Andrea Zanette , Alessandro Lazaric , Mykel Kochenderfer , Emma Brunskill

ProBO: Versatile Bayesian Optimization Using Any Probabilistic Programming Language

Optimizing an expensive-to-query function is a common task in science and engineering, where it is beneficial to keep the number of queries to a minimum. A popular strategy is Bayesian optimization (BO), which leverages probabilistic models…

Machine Learning · Computer Science 2019-07-05 Willie Neiswanger , Kirthevasan Kandasamy , Barnabas Poczos , Jeff Schneider , Eric Xing

High-Dimensional Bayesian Optimization via Random Projection of Manifold Subspaces

Bayesian Optimization (BO) is a popular approach to optimizing expensive-to-evaluate black-box functions. Despite the success of BO, its performance may decrease exponentially as the dimensionality increases. A common framework to tackle…

Machine Learning · Computer Science 2024-12-24 Quoc-Anh Hoang Nguyen , The Hung Tran