Related papers: Dynamic Programming: From Local Optimality to Glob…

The Principle of Optimality in Dynamic Programming: A Pedagogical Note

The principle of optimality is a fundamental aspect of dynamic programming, which states that the optimal solution to a dynamic optimization problem can be found by combining the optimal solutions to its sub-problems. While this principle…

Optimization and Control · Mathematics 2024-08-14 Bar Light

Dynamic Programming in Ordered Vector Space

New approaches to the theory of dynamic programming view dynamic programs as families of policy operators acting on partially ordered sets. In this paper, we extend these ideas by shifting from arbitrary partially ordered sets to ordered…

Optimization and Control · Mathematics 2026-02-02 Nisha Peng , John Stachurski

Dynamic Programming in Probability Spaces via Optimal Transport

We study discrete-time finite-horizon optimal control problems in probability spaces, whereby the state of the system is a probability measure. We show that, in many instances, the solution of dynamic programming in probability spaces…

Optimization and Control · Mathematics 2024-04-09 Antonio Terpin , Nicolas Lanzetti , Florian Dörfler

Dynamical System Optimization

We develop an optimization framework centered around a core idea: once a (parametric) policy is specified, control authority is transferred to the policy, resulting in an autonomous dynamical system. Thus we should be able to optimize…

Machine Learning · Computer Science 2025-06-11 Emo Todorov

Policy Gradient Algorithms Implicitly Optimize by Continuation

Direct policy optimization in reinforcement learning is usually solved with policy-gradient algorithms, which optimize policy parameters via stochastic gradient ascent. This paper provides a new theoretical interpretation and justification…

Machine Learning · Computer Science 2023-10-24 Adrien Bolland , Gilles Louppe , Damien Ernst

Efficient Dynamic Allocation Policy for Robust Ranking and Selection under Stochastic Control Framework

This research considers the ranking and selection with input uncertainty. The objective is to maximize the posterior probability of correctly selecting the best alternative under a fixed simulation budget, where each alternative is measured…

Optimization and Control · Mathematics 2023-05-15 Hui Xiao , Zhihong Wei

On Policy Gradients

The goal of policy gradient approaches is to find a policy in a given class of policies which maximizes the expected return. Given a differentiable model of the policy, we want to apply a gradient-ascent technique to reach a local optimum.…

Machine Learning · Computer Science 2019-11-13 Mattis Manfred Kämmerer

Optimal Decision Making Under Strategic Behavior

We are witnessing an increasing use of data-driven predictive models to inform decisions. As decisions have implications for individuals and society, there is increasing pressure on decision makers to be transparent about their decision…

Machine Learning · Computer Science 2024-02-26 Stratis Tsirtsis , Behzad Tabibian , Moein Khajehnejad , Adish Singla , Bernhard Schölkopf , Manuel Gomez-Rodriguez

Policy Search: Any Local Optimum Enjoys a Global Performance Guarantee

Local Policy Search is a popular reinforcement learning approach for handling large state spaces. Formally, it searches locally in a paramet erized policy space in order to maximize the associated value function averaged over some…

Machine Learning · Computer Science 2013-06-07 Bruno Scherrer , Matthieu Geist

Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces

Direct optimization is an appealing framework that replaces integration with optimization of a random objective for approximating gradients in models with discrete random variables. A$^\star$ sampling is a framework for optimizing such…

Machine Learning · Computer Science 2020-10-26 Guy Lorberbom , Chris J. Maddison , Nicolas Heess , Tamir Hazan , Daniel Tarlow

Where Did My Optimum Go?: An Empirical Analysis of Gradient Descent Optimization in Policy Gradient Methods

Recent analyses of certain gradient descent optimization methods have shown that performance can degrade in some settings - such as with stochasticity or implicit momentum. In deep reinforcement learning (Deep RL), such optimization methods…

Machine Learning · Computer Science 2018-10-08 Peter Henderson , Joshua Romoff , Joelle Pineau

Action-State Dependent Dynamic Model Selection

A model among many may only be best under certain states of the world. Switching from a model to another can also be costly. Finding a procedure to dynamically choose a model in these circumstances requires to solve a complex estimation…

Machine Learning · Computer Science 2023-10-10 Francesco Cordoni , Alessio Sancetta

Semilinear Dynamic Programming: Analysis, Algorithms, and Certainty Equivalence Properties

We consider a broad class of dynamic programming (DP) problems that involve a partially linear structure and some positivity properties in their system equation and cost function. We address deterministic and stochastic problems, possibly…

Optimization and Control · Mathematics 2026-04-21 Yuchao Li , Dimitri Bertsekas

The AI Definition and a Program Which Satisfies this Definition

We will consider all policies of the agent and will prove that one of them is the best performing policy. While that policy is not computable, computable policies do exist in its proximity. We will define AI as a computable policy which is…

Artificial Intelligence · Computer Science 2025-07-25 Dimiter Dobrev

A Policy Gradient Framework for Stochastic Optimal Control Problems with Global Convergence Guarantee

We consider policy gradient methods for stochastic optimal control problem in continuous time. In particular, we analyze the gradient flow for the control, viewed as a continuous time limit of the policy gradient method. We prove the global…

Optimization and Control · Mathematics 2025-04-15 Mo Zhou , Jianfeng Lu

Dynamic Programming with State-Dependent Discounting

This paper extends the core results of discrete time infinite horizon dynamic programming to the case of state-dependent discounting. We obtain a condition on the discount factor process under which all of the standard optimality results…

General Economics · Economics 2020-10-15 John Stachurski , Junnan Zhang

Hierarchical model-based policy optimization: from actions to action sequences and back

We develop a normative framework for hierarchical model-based policy optimization based on applying second-order methods in the space of all possible state-action paths. The resulting natural path gradient performs policy updates in a…

Machine Learning · Computer Science 2020-01-03 Daniel McNamee

Globally Convergent Policy Search over Dynamic Filters for Output Estimation

We introduce the first direct policy search algorithm which provably converges to the globally optimal $\textit{dynamic}$ filter for the classical problem of predicting the outputs of a linear dynamical system, given noisy, partial…

Optimization and Control · Mathematics 2022-03-01 Jack Umenberger , Max Simchowitz , Juan C. Perdomo , Kaiqing Zhang , Russ Tedrake

Dynamic Programs on Partially Ordered Sets

We introduce a framework that represents a dynamic program as a family of operators acting on a partially ordered set. We provide an optimality theory based only on order-theoretic assumptions and show how applications across almost all…

Optimization and Control · Mathematics 2025-01-07 Thomas J. Sargent , John Stachurski

A continuous time formulation of stochastic dual control to avoid the curse of dimensionality

Dual control denotes a class of control problems where the parameters governing the system are imperfectly known. The challenge is to find the optimal balance between probing, i.e. exciting the system to understand it more, and caution,…

Optimization and Control · Mathematics 2020-04-29 Martin Péron , Christopher M. Baker , Barry D. Hughes , Iadine Chadès