A Policy Optimization Method Towards Optimal-time Stability

Shengjie Wang; Fengbo Lan; Xiang Zheng; Yuxue Cao; Oluwatosin Oseni; Haotian Xu; Tao Zhang; Yang Gao

A Policy Optimization Method Towards Optimal-time Stability

Robotics 2023-10-16 v2 Machine Learning

Authors: Shengjie Wang , Fengbo Lan , Xiang Zheng , Yuxue Cao , Oluwatosin Oseni , Haotian Xu , Tao Zhang , Yang Gao

Abstract

In current model-free reinforcement learning (RL) algorithms, stability criteria based on sampling methods are commonly utilized to guide policy optimization. However, these criteria only guarantee the infinite-time convergence of the system's state to an equilibrium point, which leads to sub-optimality of the policy. In this paper, we propose a policy optimization technique incorporating sampling-based Lyapunov stability. Our approach enables the system's state to reach an equilibrium point within an optimal time and maintain stability thereafter, referred to as "optimal-time stability". To achieve this, we integrate the optimization method into the Actor-Critic framework, resulting in the development of the Adaptive Lyapunov-based Actor-Critic (ALAC) algorithm. Through evaluations conducted on ten robotic tasks, our approach outperforms previous studies significantly, effectively guiding the system to generate stable patterns.

Keywords

policy gradient reinforcement learning control theory

Cite

@article{arxiv.2301.00521,
  title  = {A Policy Optimization Method Towards Optimal-time Stability},
  author = {Shengjie Wang and Fengbo Lan and Xiang Zheng and Yuxue Cao and Oluwatosin Oseni and Haotian Xu and Tao Zhang and Yang Gao},
  journal= {arXiv preprint arXiv:2301.00521},
  year   = {2023}
}

Comments

27 pages, 11 figues. 7th Annual Conference on Robot Learning. 2023

A Policy Optimization Method Towards Optimal-time Stability

Abstract

Keywords

Cite

Comments

Related papers