Regularized Centered Emphatic Temporal Difference Learning

Xingguo Chen; Chaohui Wu; Jinguo Ye; Chao Li; Shangdong Yang; Guang Yang; Tianyu Liang; Wenhao Wang

Regularized Centered Emphatic Temporal Difference Learning

Artificial Intelligence 2026-05-07 v1

Authors: Xingguo Chen , Chaohui Wu , Jinguo Ye , Chao Li , Shangdong Yang , Guang Yang , Tianyu Liang , Wenhao Wang

Abstract

Off-policy temporal-difference (TD) learning with function approximation faces a structural tradeoff among stability, projection geometry, and variance control. Emphatic TD (ETD) improves the off-policy projection geometry through follow-on emphasis, but the follow-on trace can have high variance. We revisit this tradeoff through Bellman-error centering. Although centering naturally removes a common drift term from TD errors, we show that a naive centered emphatic extension introduces an auxiliary coupling that can destroy the positive-definiteness of the ETD key matrix. We propose \emph{Regularized Emphatic Temporal-Difference Learning} (RETD), which preserves the follow-on trace and regularizes only the auxiliary centering recursion, corresponding to lifting the lower-right block of the coupled key matrix from $1$ to $1+c$ . We derive the RETD core matrix, prove convergence under a conservative sufficient regularization condition, and evaluate the method on diagnostic linear off-policy prediction tasks. The experiments show that RETD avoids the instability of naive centered emphatic learning, preserves favorable emphatic geometry, and exhibits a robust intermediate regime for the regularization parameter $c$ across the diagnostics.

Keywords

fault tree analysis reinforcement learning

Cite

@article{arxiv.2605.04100,
  title  = {Regularized Centered Emphatic Temporal Difference Learning},
  author = {Xingguo Chen and Chaohui Wu and Jinguo Ye and Chao Li and Shangdong Yang and Guang Yang and Tianyu Liang and Wenhao Wang},
  journal= {arXiv preprint arXiv:2605.04100},
  year   = {2026}
}

Regularized Centered Emphatic Temporal Difference Learning

Abstract

Keywords

Cite

Related papers