Beyond dynamic programming

Abhinav Muraleedharan

Beyond dynamic programming

Machine Learning 2023-06-28 v1 Artificial Intelligence Robotics

Authors: Abhinav Muraleedharan

Abstract

In this paper, we present Score-life programming, a novel theoretical approach for solving reinforcement learning problems. In contrast with classical dynamic programming-based methods, our method can search over non-stationary policy functions, and can directly compute optimal infinite horizon action sequences from a given state. The central idea in our method is the construction of a mapping between infinite horizon action sequences and real numbers in a bounded interval. This construction enables us to formulate an optimization problem for directly computing optimal infinite horizon action sequences, without requiring a policy function. We demonstrate the effectiveness of our approach by applying it to nonlinear optimal control problems. Overall, our contributions provide a novel theoretical framework for formulating and solving reinforcement learning problems.

Keywords

reinforcement learning optimization algorithm game theory

Cite

@article{arxiv.2306.15029,
  title  = {Beyond dynamic programming},
  author = {Abhinav Muraleedharan},
  journal= {arXiv preprint arXiv:2306.15029},
  year   = {2023}
}

Comments

17 pages. Colab Notebook: https://colab.research.google.com/drive/1GKIMieKrYLX_YXnUOFuEvHwk8CH26zVu?usp=sharing github repo/code: https://github.com/Abhinav-Muraleedharan/Beyond_Dynamic_Programming.git

Beyond dynamic programming

Abstract

Keywords

Cite

Comments

Related papers