English

Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient

Machine Learning 2022-11-28 v2 Artificial Intelligence Machine Learning

Abstract

Offline reinforcement learning, which aims at optimizing sequential decision-making strategies with historical data, has been extensively applied in real-life applications. State-Of-The-Art algorithms usually leverage powerful function approximators (e.g. neural networks) to alleviate the sample complexity hurdle for better empirical performances. Despite the successes, a more systematic understanding of the statistical complexity for function approximation remains lacking. Towards bridging the gap, we take a step by considering offline reinforcement learning with differentiable function class approximation (DFA). This function class naturally incorporates a wide range of models with nonlinear/nonconvex structures. Most importantly, we show offline RL with differentiable function approximation is provably efficient by analyzing the pessimistic fitted Q-learning (PFQL) algorithm, and our results provide the theoretical basis for understanding a variety of practical heuristics that rely on Fitted Q-Iteration style design. In addition, we further improve our guarantee with a tighter instance-dependent characterization. We hope our work could draw interest in studying reinforcement learning with differentiable function approximation beyond the scope of current research.

Keywords

Cite

@article{arxiv.2210.00750,
  title  = {Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient},
  author = {Ming Yin and Mengdi Wang and Yu-Xiang Wang},
  journal= {arXiv preprint arXiv:2210.00750},
  year   = {2022}
}
R2 v1 2026-06-28T02:35:04.476Z