Test-time Recursive Thinking: Self-Improvement without External Feedback

Yufan Zhuang; Chandan Singh; Liyuan Liu; Yelong Shen; Dinghuai Zhang; Jingbo Shang; Jianfeng Gao; Weizhu Chen

Test-time Recursive Thinking: Self-Improvement without External Feedback

Computation and Language 2026-02-04 v1

Authors: Yufan Zhuang , Chandan Singh , Liyuan Liu , Yelong Shen , Dinghuai Zhang , Jingbo Shang , Jianfeng Gao , Weizhu Chen

View on arXiv ↗ PDF ↗

Abstract

Modern Large Language Models (LLMs) have shown rapid improvements in reasoning capabilities, driven largely by reinforcement learning (RL) with verifiable rewards. Here, we ask whether these LLMs can self-improve without the need for additional training. We identify two core challenges for such systems: (i) efficiently generating diverse, high-quality candidate solutions, and (ii) reliably selecting correct answers in the absence of ground-truth supervision. To address these challenges, we propose Test-time Recursive Thinking (TRT), an iterative self-improvement framework that conditions generation on rollout-specific strategies, accumulated knowledge, and self-generated verification signals. Using TRT, open-source models reach 100% accuracy on AIME-25/24, and on LiveCodeBench's most difficult problems, closed-source models improve by 10.4-14.8 percentage points without external feedback.

Keywords

large language model instruction tuning prompt engineering

Cite

@article{arxiv.2602.03094,
  title  = {Test-time Recursive Thinking: Self-Improvement without External Feedback},
  author = {Yufan Zhuang and Chandan Singh and Liyuan Liu and Yelong Shen and Dinghuai Zhang and Jingbo Shang and Jianfeng Gao and Weizhu Chen},
  journal= {arXiv preprint arXiv:2602.03094},
  year   = {2026}
}

Test-time Recursive Thinking: Self-Improvement without External Feedback

Abstract

Keywords

Cite

Related papers