English

Sample Complexity and Overparameterization Bounds for Temporal Difference Learning with Neural Network Approximation

Machine Learning 2021-08-09 v3 Artificial Intelligence Optimization and Control Machine Learning

Abstract

In this paper, we study the dynamics of temporal difference learning with neural network-based value function approximation over a general state space, namely, \emph{Neural TD learning}. We consider two practically used algorithms, projection-free and max-norm regularized Neural TD learning, and establish the first convergence bounds for these algorithms. An interesting observation from our results is that max-norm regularization can dramatically improve the performance of TD learning algorithms, both in terms of sample complexity and overparameterization. In particular, we prove that max-norm regularization improves state-of-the-art sample complexity and overparameterization bounds. The results in this work rely on a novel Lyapunov drift analysis of the network parameters as a stopped and controlled random process.

Keywords

Cite

@article{arxiv.2103.01391,
  title  = {Sample Complexity and Overparameterization Bounds for Temporal Difference Learning with Neural Network Approximation},
  author = {Semih Cayci and Siddhartha Satpathi and Niao He and R. Srikant},
  journal= {arXiv preprint arXiv:2103.01391},
  year   = {2021}
}
R2 v1 2026-06-23T23:38:28.902Z