English

Dual Approximation Policy Optimization

Machine Learning 2024-10-03 v1

Abstract

We propose Dual Approximation Policy Optimization (DAPO), a framework that incorporates general function approximation into policy mirror descent methods. In contrast to the popular approach of using the L2L_2-norm to measure function approximation errors, DAPO uses the dual Bregman divergence induced by the mirror map for policy projection. This duality framework has both theoretical and practical implications: not only does it achieve fast linear convergence with general function approximation, but it also includes several well-known practical methods as special cases, immediately providing strong convergence guarantees.

Keywords

Cite

@article{arxiv.2410.01249,
  title  = {Dual Approximation Policy Optimization},
  author = {Zhihan Xiong and Maryam Fazel and Lin Xiao},
  journal= {arXiv preprint arXiv:2410.01249},
  year   = {2024}
}

Comments

30 pages, 2 figures

R2 v1 2026-06-28T19:04:43.441Z