Data-Efficient Quadratic Q-Learning Using LMIs

J. S. van Hulst; W. P. M. H. Heemels; D. J. Antunes

doi:10.1109/CDC56724.2024.10886653

Data-Efficient Quadratic Q-Learning Using LMIs

Systems and Control 2025-04-09 v1 Systems and Control

Authors: J. S. van Hulst , W. P. M. H. Heemels , D. J. Antunes

View on arXiv ↗ PDF ↗ DOI ↗

Abstract

Reinforcement learning (RL) has seen significant research and application results but often requires large amounts of training data. This paper proposes two data-efficient off-policy RL methods that use parametrized Q-learning. In these methods, the Q-function is chosen to be linear in the parameters and quadratic in selected basis functions in the state and control deviations from a base policy. A cost penalizing the $\ell_1$ -norm of Bellman errors is minimized. We propose two methods: Linear Matrix Inequality Q-Learning (LMI-QL) and its iterative variant (LMI-QLi), which solve the resulting episodic optimization problem through convex optimization. LMI-QL relies on a convex relaxation that yields a semidefinite programming (SDP) problem with linear matrix inequalities (LMIs). LMI-QLi entails solving sequential iterations of an SDP problem. Both methods combine convex optimization with direct Q-function learning, significantly improving learning speed. A numerical case study demonstrates their advantages over existing parametrized Q-learning methods.

Keywords

reinforcement learning matrix optimization and algorithms system identification and control

Cite

@article{arxiv.2409.11986,
  title  = {Data-Efficient Quadratic Q-Learning Using LMIs},
  author = {J. S. van Hulst and W. P. M. H. Heemels and D. J. Antunes},
  journal= {arXiv preprint arXiv:2409.11986},
  year   = {2025}
}

Comments

Accepted for Presentation at 63rd IEEE Conference on Decision and Control, CDC 2024, Milan, Italy, 2024

Data-Efficient Quadratic Q-Learning Using LMIs

Abstract

Keywords

Cite

Comments

Related papers