Related papers: Dynamic Programming: From Local Optimality to Glob…
The principle of optimality is a fundamental aspect of dynamic programming, which states that the optimal solution to a dynamic optimization problem can be found by combining the optimal solutions to its sub-problems. While this principle…
New approaches to the theory of dynamic programming view dynamic programs as families of policy operators acting on partially ordered sets. In this paper, we extend these ideas by shifting from arbitrary partially ordered sets to ordered…
We study discrete-time finite-horizon optimal control problems in probability spaces, whereby the state of the system is a probability measure. We show that, in many instances, the solution of dynamic programming in probability spaces…
We develop an optimization framework centered around a core idea: once a (parametric) policy is specified, control authority is transferred to the policy, resulting in an autonomous dynamical system. Thus we should be able to optimize…
Direct policy optimization in reinforcement learning is usually solved with policy-gradient algorithms, which optimize policy parameters via stochastic gradient ascent. This paper provides a new theoretical interpretation and justification…
This research considers the ranking and selection with input uncertainty. The objective is to maximize the posterior probability of correctly selecting the best alternative under a fixed simulation budget, where each alternative is measured…
The goal of policy gradient approaches is to find a policy in a given class of policies which maximizes the expected return. Given a differentiable model of the policy, we want to apply a gradient-ascent technique to reach a local optimum.…
We are witnessing an increasing use of data-driven predictive models to inform decisions. As decisions have implications for individuals and society, there is increasing pressure on decision makers to be transparent about their decision…
Local Policy Search is a popular reinforcement learning approach for handling large state spaces. Formally, it searches locally in a paramet erized policy space in order to maximize the associated value function averaged over some…
Direct optimization is an appealing framework that replaces integration with optimization of a random objective for approximating gradients in models with discrete random variables. A$^\star$ sampling is a framework for optimizing such…
Recent analyses of certain gradient descent optimization methods have shown that performance can degrade in some settings - such as with stochasticity or implicit momentum. In deep reinforcement learning (Deep RL), such optimization methods…
A model among many may only be best under certain states of the world. Switching from a model to another can also be costly. Finding a procedure to dynamically choose a model in these circumstances requires to solve a complex estimation…
We consider a broad class of dynamic programming (DP) problems that involve a partially linear structure and some positivity properties in their system equation and cost function. We address deterministic and stochastic problems, possibly…
We will consider all policies of the agent and will prove that one of them is the best performing policy. While that policy is not computable, computable policies do exist in its proximity. We will define AI as a computable policy which is…
We consider policy gradient methods for stochastic optimal control problem in continuous time. In particular, we analyze the gradient flow for the control, viewed as a continuous time limit of the policy gradient method. We prove the global…
This paper extends the core results of discrete time infinite horizon dynamic programming to the case of state-dependent discounting. We obtain a condition on the discount factor process under which all of the standard optimality results…
We develop a normative framework for hierarchical model-based policy optimization based on applying second-order methods in the space of all possible state-action paths. The resulting natural path gradient performs policy updates in a…
We introduce the first direct policy search algorithm which provably converges to the globally optimal $\textit{dynamic}$ filter for the classical problem of predicting the outputs of a linear dynamical system, given noisy, partial…
We introduce a framework that represents a dynamic program as a family of operators acting on a partially ordered set. We provide an optimality theory based only on order-theoretic assumptions and show how applications across almost all…
Dual control denotes a class of control problems where the parameters governing the system are imperfectly known. The challenge is to find the optimal balance between probing, i.e. exciting the system to understand it more, and caution,…