Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning

Zhiyuan Hu; Yunhai Hu; Juncheng Liu; Shuyue Stella Li; Yucheng Wang; Zhen Xu; See-Kiong Ng; Anh Tuan Luu; Xinxing Xu; Bryan Hooi; Cynthia Breazeal; Hae Won Park

Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning

Artificial Intelligence 2026-01-16 v2 Computation and Language

Authors: Zhiyuan Hu , Yunhai Hu , Juncheng Liu , Shuyue Stella Li , Yucheng Wang , Zhen Xu , See-Kiong Ng , Anh Tuan Luu , Xinxing Xu , Bryan Hooi , Cynthia Breazeal , Hae Won Park

View on arXiv ↗ PDF ↗

Abstract

Multi-agent systems have evolved into practical LLM-driven collaborators for many applications, gaining robustness from diversity and cross-checking. However, multi-agent RL (MARL) training is resource-intensive and unstable: co-adapting teammates induce non-stationarity, and rewards are often sparse and high-variance. Therefore, we introduce \textbf{Multi-Agent Test-Time Reinforcement Learning (MATTRL)}, a framework that injects structured textual experience into multi-agent deliberation at inference time. MATTRL forms a multi-expert team of specialists for multi-turn discussions, retrieves and integrates test-time experiences, and reaches consensus for final decision-making. We also study credit assignment for constructing a turn-level experience pool, then reinjecting it into the dialogue. Across challenging benchmarks in medicine, math, and education, MATTRL improves accuracy by an average of 3.67\% over a multi-agent baseline, and by 8.67\% over comparable single-agent baselines. Ablation studies examine different credit-assignment schemes and provide a detailed comparison of how they affect training outcomes. MATTRL offers a stable, effective and efficient path to distribution-shift-robust multi-agent reasoning without tuning.

Keywords

multi-agent reinforcement learning multi-agent reasoning reinforcement learning

Cite

@article{arxiv.2601.09667,
  title  = {Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning},
  author = {Zhiyuan Hu and Yunhai Hu and Juncheng Liu and Shuyue Stella Li and Yucheng Wang and Zhen Xu and See-Kiong Ng and Anh Tuan Luu and Xinxing Xu and Bryan Hooi and Cynthia Breazeal and Hae Won Park},
  journal= {arXiv preprint arXiv:2601.09667},
  year   = {2026}
}

Comments

Work in Progress

Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning

Abstract

Keywords

Cite

Comments

Related papers