English

Learning the Target Network in Function Space

Machine Learning 2024-09-24 v2 Artificial Intelligence

Abstract

We focus on the task of learning the value function in the reinforcement learning (RL) setting. This task is often solved by updating a pair of online and target networks while ensuring that the parameters of these two networks are equivalent. We propose Lookahead-Replicate (LR), a new value-function approximation algorithm that is agnostic to this parameter-space equivalence. Instead, the LR algorithm is designed to maintain an equivalence between the two networks in the function space. This value-based equivalence is obtained by employing a new target-network update. We show that LR leads to a convergent behavior in learning the value function. We also present empirical results demonstrating that LR-based target-network updates significantly improve deep RL on the Atari benchmark.

Keywords

Cite

@article{arxiv.2406.01838,
  title  = {Learning the Target Network in Function Space},
  author = {Kavosh Asadi and Yao Liu and Shoham Sabach and Ming Yin and Rasool Fakoor},
  journal= {arXiv preprint arXiv:2406.01838},
  year   = {2024}
}

Comments

Accepted to International Conference on Machine Learning (ICML24)

R2 v1 2026-06-28T16:52:08.245Z