English

Test-time Recursive Thinking: Self-Improvement without External Feedback

Computation and Language 2026-02-04 v1

Abstract

Modern Large Language Models (LLMs) have shown rapid improvements in reasoning capabilities, driven largely by reinforcement learning (RL) with verifiable rewards. Here, we ask whether these LLMs can self-improve without the need for additional training. We identify two core challenges for such systems: (i) efficiently generating diverse, high-quality candidate solutions, and (ii) reliably selecting correct answers in the absence of ground-truth supervision. To address these challenges, we propose Test-time Recursive Thinking (TRT), an iterative self-improvement framework that conditions generation on rollout-specific strategies, accumulated knowledge, and self-generated verification signals. Using TRT, open-source models reach 100% accuracy on AIME-25/24, and on LiveCodeBench's most difficult problems, closed-source models improve by 10.4-14.8 percentage points without external feedback.

Keywords

Cite

@article{arxiv.2602.03094,
  title  = {Test-time Recursive Thinking: Self-Improvement without External Feedback},
  author = {Yufan Zhuang and Chandan Singh and Liyuan Liu and Yelong Shen and Dinghuai Zhang and Jingbo Shang and Jianfeng Gao and Weizhu Chen},
  journal= {arXiv preprint arXiv:2602.03094},
  year   = {2026}
}
R2 v1 2026-07-01T09:33:28.612Z