Related papers: Parameterized Projected Bellman Operator
In this paper, we consider the stochastic iterative counterpart of the value iteration scheme wherein only noisy and possibly biased approximations of the Bellman operator are available. We call this counterpart as the approximate value…
In this paper, we study the theoretical properties of the projected Bellman equation (PBE) and two algorithms to solve this equation: linear Q-learning and approximate value iteration (AVI). We consider two sufficient conditions for the…
Offline reinforcement learning, which seeks to utilize offline/historical data to optimize sequential decision-making strategies, has gained surging prominence in recent studies. Due to the advantage that appropriate function approximators…
For continuous action spaces, actor-critic methods are widely used in online reinforcement learning (RL). However, unlike RL algorithms for discrete actions, which generally model the optimal value function using the Bellman optimality…
Policy evaluation is a key process in Reinforcement Learning (RL). It assesses a given policy by estimating the corresponding value function. When using parameterized value functions, common approaches minimize the sum of squared Bellman…
Preferential Bayesian Optimization (PBO) is a sample-efficient method to learn latent user utilities from preferential feedback over a pair of designs. It relies on a statistical surrogate model for the latent function, usually a Gaussian…
Recent progress in randomized motion planners has led to the development of a new class of sampling-based algorithms that provide asymptotic optimality guarantees, notably the RRT* and the PRM* algorithms. Careful analysis reveals that the…
We introduce a novel perspective on Bayesian reinforcement learning (RL); whereas existing approaches infer a posterior over the transition distribution or Q-function, we characterise the uncertainty in the Bellman operator. Our Bayesian…
The performance of deep (reinforcement) learning systems crucially depends on the choice of hyperparameters. Their tuning is notoriously expensive, typically requiring an iterative training process to run for numerous steps to convergence.…
We propose universal randomized function approximation-based empirical value iteration (EVI) algorithms for Markov decision processes. The `empirical' nature comes from each iteration being done empirically from samples available from…
This paper investigates the application of Reinforcement Learning (RL) to optimise call routing in call centres to minimise client waiting time and staff idle time. Two methods are compared: a model-based approach using Value Iteration (VI)…
We develop a parameterized Primal-Dual $\pi$ Learning method based on deep neural networks for Markov decision process with large state space and off-policy reinforcement learning. In contrast to the popular Q-learning and actor-critic…
We propose empirical dynamic programming algorithms for Markov decision processes (MDPs). In these algorithms, the exact expectation in the Bellman operator in classical value iteration is replaced by an empirical estimate to get `empirical…
One of the most natural approaches to reinforcement learning (RL) with function approximation is value iteration, which inductively generates approximations to the optimal value function by solving a sequence of regression problems. To…
Reinforcement Learning (RL) has made significant strides in various domains, and policy gradient methods like Proximal Policy Optimization (PPO) have gained popularity due to their balance in performance, training stability, and…
Reliable long-horizon value prediction is difficult in offline reinforcement learning because fitted value methods combine bootstrapping, function approximation, and distribution shift, while standard guarantees often require Bellman…
Two standard models for probabilistic systems are Markov chains (MCs) and Markov decision processes (MDPs). Classic objectives for such probabilistic models for control and planning problems are reachability and stochastic shortest path.…
We study the exploration problem with approximate linear action-value functions in episodic reinforcement learning under the notion of low inherent Bellman error, a condition normally employed to show convergence of approximate value…
Optimizing an expensive-to-query function is a common task in science and engineering, where it is beneficial to keep the number of queries to a minimum. A popular strategy is Bayesian optimization (BO), which leverages probabilistic models…
Bayesian Optimization (BO) is a popular approach to optimizing expensive-to-evaluate black-box functions. Despite the success of BO, its performance may decrease exponentially as the dimensionality increases. A common framework to tackle…