Towards Parameter-Free Temporal Difference Learning

Yunxiang Li; Mark Schmidt; Reza Babanezhad; Sharan Vaswani

Towards Parameter-Free Temporal Difference Learning

Machine Learning 2026-03-04 v1

Authors: Yunxiang Li , Mark Schmidt , Reza Babanezhad , Sharan Vaswani

Abstract

Temporal difference (TD) learning is a fundamental algorithm for estimating value functions in reinforcement learning. Recent finite-time analyses of TD with linear function approximation quantify its theoretical convergence rate. However, they often require setting the algorithm parameters using problem-dependent quantities that are difficult to estimate in practice -- such as the minimum eigenvalue of the feature covariance ( $\omega$ ) or the mixing time of the underlying Markov chain ( $\tau_{\text{mix}}$ ). In addition, some analyses rely on nonstandard and impractical modifications, exacerbating the gap between theory and practice. To address these limitations, we use an exponential step-size schedule with the standard TD(0) algorithm. We analyze the resulting method under two sampling regimes: independent and identically distributed (i.i.d.) sampling from the stationary distribution, and the more practical Markovian sampling along a single trajectory. In the i.i.d.\ setting, the proposed algorithm does not require knowledge of problem-dependent quantities such as $\omega$ , and attains the optimal bias-variance trade-off for the last iterate. In the Markovian setting, we propose a regularized TD(0) algorithm with an exponential step-size schedule. The resulting algorithm achieves a comparable convergence rate to prior works, without requiring projections, iterate averaging, or knowledge of $\tau_{\text{mix}}$ or $\omega$ .

Keywords

machine learning theory markov chain time series classification

Cite

@article{arxiv.2603.02577,
  title  = {Towards Parameter-Free Temporal Difference Learning},
  author = {Yunxiang Li and Mark Schmidt and Reza Babanezhad and Sharan Vaswani},
  journal= {arXiv preprint arXiv:2603.02577},
  year   = {2026}
}

Towards Parameter-Free Temporal Difference Learning

Abstract

Keywords

Cite

Related papers