Related papers: Beyond dynamic programming
Reinforcement learning is a general technique that allows an agent to learn an optimal policy and interact with an environment in sequential decision making problems. The goodness of a policy is measured by its value function starting from…
We propose a vector linear programming formulation for a non-stationary, finite-horizon Markov decision process with vector-valued rewards. Pareto efficient policies are shown to correspond to efficient solutions of the linear program, and…
This paper considers an optimal impulse control problem of dynamical systems generated by a flow. The performance criteria are total costs over the infinite time horizon. Apart from the main performance to be minimized, there are multiple…
Real-time inference is a challenge of real-world reinforcement learning due to temporal differences in time-varying environments: the system collects data from the past, updates the decision model in the present, and deploys it in the…
We develop the dynamic programming approach for a family of infinite horizon boundary control problems with linear state equation and convex cost. We prove that the value function of the problem is the unique regular solution of the…
In this work, we propose, for the first time, a reinforcement learning framework specifically designed for zero-sum linear-quadratic stochastic differential games. This approach offers a generalized solution for scenarios in which accurate…
In this paper, we present a novel algorithm named synchronous integral Q-learning, which is based on synchronous policy iteration, to solve the continuous-time infinite horizon optimal control problems of input-affine system dynamics. The…
The infinite horizon setting is widely adopted for problems of reinforcement learning (RL). These invariably result in stationary policies that are optimal. In many situations, finite horizon control problems are of interest and for such…
Many real-world optimization problems contain parameters that are unknown before deployment time, either due to stochasticity or to lack of information (e.g., demand or travel times in delivery problems). A common strategy in such cases is…
We extend the standard reinforcement learning framework to random time horizons. While the classical setting typically assumes finite and deterministic or infinite runtimes of trajectories, we argue that multiple real-world applications…
With the recent advancements of technology in facilitating real-time monitoring and data collection, "just-in-time" interventions can be delivered via mobile devices to achieve both real-time and long-term management and control.…
The solution of a constrained linear-quadratic regulator problem is determined by the set of its optimal active sets. We propose an algorithm that constructs this set of active sets for a desired horizon N from that for horizon N-1. While…
As humans, our goals and our environment are persistently changing throughout our lifetime based on our experiences, actions, and internal and external drives. In contrast, typical reinforcement learning problem set-ups consider decision…
This paper develops algorithms for high-dimensional stochastic control problems based on deep learning and dynamic programming. Unlike classical approximate dynamic programming approaches, we first approximate the optimal policy by means of…
We describe a nonlinear generalization of dual dynamic programming theory and its application to value function estimation for deterministic control problems over continuous state and action spaces, in a discrete-time infinite horizon…
Discrete optimization problems often arise in deep learning tasks, despite the fact that neural networks typically operate on continuous data. One class of these problems involve objective functions which depend on neural networks, but…
We present one of the first algorithms on model based reinforcement learning and trajectory optimization with free final time horizon. Grounded on the optimal control theory and Dynamic Programming, we derive a set of backward differential…
Policy optimization is among the most popular and successful reinforcement learning algorithms, and there is increasing interest in understanding its theoretical guarantees. In this work, we initiate the study of policy optimization for the…
Policy gradient methods have shown success in learning control policies for high-dimensional dynamical systems. Their biggest downside is the amount of exploration they require before yielding high-performing policies. In a lifelong…
Recent advances in robot skill learning have unlocked the potential to construct task-agnostic skill libraries, facilitating the seamless sequencing of multiple simple manipulation primitives (aka. skills) to tackle significantly more…