English

Temporal Difference Flows

Machine Learning 2025-03-14 v1 Artificial Intelligence Machine Learning

Abstract

Predictive models of the future are fundamental for an agent's ability to reason and plan. A common strategy learns a world model and unrolls it step-by-step at inference, where small errors can rapidly compound. Geometric Horizon Models (GHMs) offer a compelling alternative by directly making predictions of future states, avoiding cumulative inference errors. While GHMs can be conveniently learned by a generative analog to temporal difference (TD) learning, existing methods are negatively affected by bootstrapping predictions at train time and struggle to generate high-quality predictions at long horizons. This paper introduces Temporal Difference Flows (TD-Flow), which leverages the structure of a novel Bellman equation on probability paths alongside flow-matching techniques to learn accurate GHMs at over 5x the horizon length of prior methods. Theoretically, we establish a new convergence result and primarily attribute TD-Flow's efficacy to reduced gradient variance during training. We further show that similar arguments can be extended to diffusion-based methods. Empirically, we validate TD-Flow across a diverse set of domains on both generative metrics and downstream tasks including policy evaluation. Moreover, integrating TD-Flow with recent behavior foundation models for planning over pre-trained policies demonstrates substantial performance gains, underscoring its promise for long-horizon decision-making.

Keywords

Cite

@article{arxiv.2503.09817,
  title  = {Temporal Difference Flows},
  author = {Jesse Farebrother and Matteo Pirotta and Andrea Tirinzoni and Rémi Munos and Alessandro Lazaric and Ahmed Touati},
  journal= {arXiv preprint arXiv:2503.09817},
  year   = {2025}
}
R2 v1 2026-06-28T22:18:14.139Z