Related papers: Test-time Recursive Thinking: Self-Improvement wit…

Language Model Self-improvement by Reinforcement Learning Contemplation

Large Language Models (LLMs) have exhibited remarkable performance across various natural language processing (NLP) tasks. However, fine-tuning these models often necessitates substantial supervision, which can be expensive and…

Computation and Language · Computer Science 2023-05-25 Jing-Cheng Pang , Pengyuan Wang , Kaiyuan Li , Xiong-Hui Chen , Jiacheng Xu , Zongzhang Zhang , Yang Yu

TTRL: Test-Time Reinforcement Learning

This paper investigates Reinforcement Learning (RL) on data without explicit labels for reasoning tasks in Large Language Models (LLMs). The core challenge of the problem is reward estimation during inference while not having access to…

Computation and Language · Computer Science 2025-07-01 Yuxin Zuo , Kaiyan Zhang , Li Sheng , Shang Qu , Ganqu Cui , Xuekai Zhu , Haozhan Li , Yuchen Zhang , Xinwei Long , Ermo Hua , Biqing Qi , Youbang Sun , Zhiyuan Ma , Lifan Yuan , Ning Ding , Bowen Zhou

RoiRL: Efficient, Self-Supervised Reasoning with Offline Iterative Reinforcement Learning

Reinforcement learning (RL) is central to improving reasoning in large language models (LLMs) but typically requires ground-truth rewards. Test-Time Reinforcement Learning (TTRL) removes this need by using majority-vote rewards, but relies…

Machine Learning · Computer Science 2025-10-06 Aleksei Arzhantsev , Otmane Sakhi , Flavian Vasile

A Training-Free Regeneration Paradigm: Contrastive Reflection Memory Guided Self-Verification and Self-Improvement

Verification-guided self-improvement has recently emerged as a promising approach to improving the accuracy of large language model (LLM) outputs. However, existing approaches face a trade-off between inference efficiency and accuracy:…

Computation and Language · Computer Science 2026-03-24 Yuran Li , Di Wu , Benoit Boulet

Large Language Models Can Self-Improve

Large Language Models (LLMs) have achieved excellent performances in various tasks. However, fine-tuning an LLM requires extensive supervision. Human, on the other hand, may improve their reasoning abilities by self-thinking without…

Computation and Language · Computer Science 2022-10-26 Jiaxin Huang , Shixiang Shane Gu , Le Hou , Yuexin Wu , Xuezhi Wang , Hongkun Yu , Jiawei Han

Learning From Correctness Without Prompting Makes LLM Efficient Reasoner

Large language models (LLMs) have demonstrated outstanding performance across various tasks, yet they still exhibit limitations such as hallucination, unfaithful reasoning, and toxic content. One potential approach to mitigate these issues…

Computation and Language · Computer Science 2024-07-19 Yuxuan Yao , Han Wu , Zhijiang Guo , Biyan Zhou , Jiahui Gao , Sichun Luo , Hanxu Hou , Xiaojin Fu , Linqi Song

The Self-Improvement Paradox: Can Language Models Bootstrap Reasoning Capabilities without External Scaffolding?

Self-improving large language models (LLMs) -- i.e., to improve the performance of an LLM by fine-tuning it with synthetic data generated by itself -- is a promising way to advance the capabilities of LLMs while avoiding extensive…

Computation and Language · Computer Science 2025-02-20 Yutao Sun , Mingshuai Chen , Tiancheng Zhao , Ruochen Xu , Zilun Zhang , Jianwei Yin

The Path of Self-Evolving Large Language Models: Achieving Data-Efficient Learning via Intrinsic Feedback

Reinforcement learning (RL) has demonstrated potential in enhancing the reasoning capabilities of large language models (LLMs), but such training typically demands substantial efforts in creating and annotating data. In this work, we…

Computation and Language · Computer Science 2025-10-06 Hangfan Zhang , Siyuan Xu , Zhimeng Guo , Huaisheng Zhu , Shicheng Liu , Xinrun Wang , Qiaosheng Zhang , Yang Chen , Peng Ye , Lei Bai , Shuyue Hu

Think Thrice Before You Act: Progressive Thought Refinement in Large Language Models

Recent advancements in large language models (LLMs) have demonstrated that progressive refinement, rather than providing a single answer, results in more accurate and thoughtful outputs. However, existing methods often rely heavily on…

Computation and Language · Computer Science 2024-10-18 Chengyu Du , Jinyi Han , Yizhou Ying , Aili Chen , Qianyu He , Haokun Zhao , Sirui Xia , Haoran Guo , Jiaqing Liang , Zulong Chen , Liangyue Li , Yanghua Xiao

Are Retrials All You Need? Enhancing Large Language Model Reasoning Without Verbalized Feedback

Recent advancements in large language models (LLMs) have catalyzed the development of general-purpose autonomous agents, demonstrating remarkable performance in complex reasoning tasks across various domains. This surge has spurred the…

Computation and Language · Computer Science 2025-04-18 Nearchos Potamitis , Akhil Arora

Rethinking with Retrieval: Faithful Large Language Model Inference

Despite the success of large language models (LLMs) in various natural language processing (NLP) tasks, the stored knowledge in these models may inevitably be incomplete, out-of-date, or incorrect. This motivates the need to utilize…

Computation and Language · Computer Science 2023-01-03 Hangfeng He , Hongming Zhang , Dan Roth

TTSR: Test-Time Self-Reflection for Continual Reasoning Improvement

Test-time Training enables model adaptation using only test questions and offers a promising paradigm for improving the reasoning ability of large language models (LLMs). However, it faces two major challenges: test questions are often…

Computation and Language · Computer Science 2026-03-05 Haoyang He , Zihua Rong , Liangjie Zhao , Yunjia Zhao , Lan Yang , Honggang Zhang

Instruct-of-Reflection: Enhancing Large Language Models Iterative Reflection Capabilities via Dynamic-Meta Instruction

Self-reflection for Large Language Models (LLMs) has gained significant attention. Existing approaches involve models iterating and improving their previous responses based on LLMs' internal reflection ability or external feedback. However,…

Computation and Language · Computer Science 2025-03-04 Liping Liu , Chunhong Zhang , Likang Wu , Chuang Zhao , Zheng Hu , Ming He , Jianping Fan

Self-Refine: Iterative Refinement with Self-Feedback

Like humans, large language models (LLMs) do not always generate the best output on their first try. Motivated by how humans refine their written text, we introduce Self-Refine, an approach for improving initial outputs from LLMs through…

Computation and Language · Computer Science 2023-05-29 Aman Madaan , Niket Tandon , Prakhar Gupta , Skyler Hallinan , Luyu Gao , Sarah Wiegreffe , Uri Alon , Nouha Dziri , Shrimai Prabhumoye , Yiming Yang , Shashank Gupta , Bodhisattwa Prasad Majumder , Katherine Hermann , Sean Welleck , Amir Yazdanbakhsh , Peter Clark

ThinkTuning: Instilling Cognitive Reflections without Distillation

Recent advances in test-time scaling have led to the emergence of thinking LLMs that exhibit self-reflective behaviors and multi-step reasoning. While RL drives this self-improvement paradigm, a recent study (Gandhi et al., 2025) shows that…

Artificial Intelligence · Computer Science 2025-08-22 Aswin RRV , Jacob Dineen , Divij Handa , Md Nayem Uddin , Mihir Parmar , Chitta Baral , Ben Zhou

Self-Training Large Language Models for Tool-Use Without Demonstrations

Large language models (LLMs) remain prone to factual inaccuracies and computational errors, including hallucinations and mistakes in mathematical reasoning. Recent work augmented LLMs with tools to mitigate these shortcomings, but often…

Computation and Language · Computer Science 2025-02-11 Ne Luo , Aryo Pradipta Gema , Xuanli He , Emile van Krieken , Pietro Lesci , Pasquale Minervini

A Task-Centric Theory for Iterative Self-Improvement with Easy-to-Hard Curricula

Iterative self-improvement fine-tunes an autoregressive large language model (LLM) on reward-verified outputs generated by the LLM itself. In contrast to the empirical success of self-improvement, the theoretical foundation of this…

Machine Learning · Computer Science 2026-03-23 Chenruo Liu , Yijun Dong , Yiqiu Shen , Qi Lei

Self-Training Meets Consistency: Improving LLMs' Reasoning with Consistency-Driven Rationale Evaluation

Self-training approach for large language models (LLMs) improves reasoning abilities by training the models on their self-generated rationales. Previous approaches have labeled rationales that produce correct answers for a given question as…

Machine Learning · Computer Science 2025-02-07 Jaehyeok Lee , Keisuke Sakaguchi , JinYeong Bak

A Model Can Help Itself: Reward-Free Self-Training for LLM Reasoning

Can language models improve their reasoning performance without external rewards, using only their own sampled responses for training? We show that they can. We propose Self-evolving Post-Training (SePT), a simple post-training method that…

Machine Learning · Computer Science 2026-05-18 Mengqi Li , Lei Zhao , Anthony Man-Cho So , Ruoyu Sun , Xiao Li

Large Language Models Cannot Self-Correct Reasoning Yet

Large Language Models (LLMs) have emerged as a groundbreaking technology with their unparalleled text generation capabilities across various applications. Nevertheless, concerns persist regarding the accuracy and appropriateness of their…

Computation and Language · Computer Science 2024-03-15 Jie Huang , Xinyun Chen , Swaroop Mishra , Huaixiu Steven Zheng , Adams Wei Yu , Xinying Song , Denny Zhou