Related papers: Self-Correcting Large Language Models: Generation …

Large Language Models Cannot Self-Correct Reasoning Yet

Large Language Models (LLMs) have emerged as a groundbreaking technology with their unparalleled text generation capabilities across various applications. Nevertheless, concerns persist regarding the accuracy and appropriateness of their…

Computation and Language · Computer Science 2024-03-15 Jie Huang , Xinyun Chen , Swaroop Mishra , Huaixiu Steven Zheng , Adams Wei Yu , Xinying Song , Denny Zhou

Self-Evaluation Improves Selective Generation in Large Language Models

Safe deployment of large language models (LLMs) may benefit from a reliable method for assessing their generated content to determine when to abstain or to selectively generate. While likelihood-based metrics such as perplexity are widely…

Computation and Language · Computer Science 2023-12-18 Jie Ren , Yao Zhao , Tu Vu , Peter J. Liu , Balaji Lakshminarayanan

Language Models can perform Single-Utterance Self-Correction of Perturbed Reasoning

Large Language Models (LLMs) have demonstrated impressive mathematical reasoning capabilities, yet their performance remains brittle to minor variations in problem description and prompting strategy. Furthermore, reasoning is vulnerable to…

Computation and Language · Computer Science 2025-06-23 Sam Silver , Jimin Sun , Ivan Zhang , Sara Hooker , Eddie Kim

Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies

Large language models (LLMs) have demonstrated remarkable performance across a wide array of NLP tasks. However, their efficacy is undermined by undesired and inconsistent behaviors, including hallucination, unfaithful reasoning, and toxic…

Computation and Language · Computer Science 2023-08-31 Liangming Pan , Michael Saxon , Wenda Xu , Deepak Nathani , Xinyi Wang , William Yang Wang

Self-rewarding correction for mathematical reasoning

We study self-rewarding reasoning large language models (LLMs), which can simultaneously generate step-by-step reasoning and evaluate the correctness of their outputs during the inference time-without external feedback. This integrated…

Artificial Intelligence · Computer Science 2025-02-28 Wei Xiong , Hanning Zhang , Chenlu Ye , Lichang Chen , Nan Jiang , Tong Zhang

Towards Reasoning in Large Language Models via Multi-Agent Peer Review Collaboration

Large Language Models (LLMs) have shown remarkable capabilities in general natural language processing tasks but often fall short in complex reasoning tasks. Recent studies have explored human-like problem-solving strategies, such as…

Computation and Language · Computer Science 2023-12-19 Zhenran Xu , Senbao Shi , Baotian Hu , Jindi Yu , Dongfang Li , Min Zhang , Yuxiang Wu

Self-Correction is More than Refinement: A Learning Framework for Visual and Language Reasoning Tasks

While Vision-Language Models (VLMs) have shown remarkable abilities in visual and language reasoning tasks, they invariably generate flawed responses. Self-correction that instructs models to refine their outputs presents a promising…

Computation and Language · Computer Science 2025-06-06 Jiayi He , Hehai Lin , Qingyun Wang , Yi Fung , Heng Ji

Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models

Self-improvement is a mechanism in Large Language Model (LLM) pre-training, post-training and test-time inference. We explore a framework where the model verifies its own outputs, filters or reweights data based on this verification, and…

Computation and Language · Computer Science 2025-02-26 Yuda Song , Hanlin Zhang , Carson Eisenach , Sham Kakade , Dean Foster , Udaya Ghai

On the Intrinsic Self-Correction Capability of LLMs: Uncertainty and Latent Concept

Large Language Models (LLMs) are able to improve their responses when instructed to do so, a capability known as self-correction. When instructions provide only the task's goal without specific details about potential issues in the…

Computation and Language · Computer Science 2024-11-11 Guangliang Liu , Haitao Mao , Bochuan Cao , Zhiyu Xue , Xitong Zhang , Rongrong Wang , Jiliang Tang , Kristen Johnson

On the Convergence of Moral Self-Correction in Large Language Models

Large Language Models (LLMs) are able to improve their responses when instructed to do so, a capability known as self-correction. When instructions provide only a general and abstract goal without specific details about potential issues in…

Computation and Language · Computer Science 2025-10-28 Guangliang Liu , Haitao Mao , Bochuan Cao , Zhiyu Xue , Xitong Zhang , Rongrong Wang , Kristen Marie Johnson

Can Large Language Models Invent Algorithms to Improve Themselves?: Algorithm Discovery for Recursive Self-Improvement through Reinforcement Learning

Large Language Models (LLMs) have achieved remarkable capabilities, yet their improvement methods remain fundamentally constrained by human design. We present Self-Developing, a framework that enables LLMs to autonomously discover,…

Computation and Language · Computer Science 2025-06-11 Yoichi Ishibashi , Taro Yano , Masafumi Oyamada

Small Language Models Need Strong Verifiers to Self-Correct Reasoning

Self-correction has emerged as a promising solution to boost the reasoning performance of large language models (LLMs), where LLMs refine their solutions using self-generated critiques that pinpoint the errors. This work explores whether…

Computation and Language · Computer Science 2024-06-07 Yunxiang Zhang , Muhammad Khalifa , Lajanugen Logeswaran , Jaekyeom Kim , Moontae Lee , Honglak Lee , Lu Wang

An Overview and Discussion on Using Large Language Models for Implementation Generation of Solutions to Open-Ended Problems

Large Language Models offer new opportunities to devise automated implementation generation methods that can tackle problem solving activities beyond traditional methods, which require algorithmic specifications and can use only static…

Computation and Language · Computer Science 2025-01-06 Hashmath Shaik , Alex Doboli

Can LLMs Correct Themselves? A Benchmark of Self-Correction in LLMs

Self-correction of large language models (LLMs) emerges as a critical component for enhancing their reasoning performance. Although various self-correction methods have been proposed, a comprehensive evaluation of these methods remains…

Computation and Language · Computer Science 2025-10-23 Guiyao Tie , Zenghui Yuan , Zeli Zhao , Chaoran Hu , Tianhe Gu , Ruihang Zhang , Sizhe Zhang , Junran Wu , Xiaoyue Tu , Ming Jin , Qingsong Wen , Lixing Chen , Pan Zhou , Lichao Sun

Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs

Large language models (LLMs) are increasingly optimized for long reasoning, under the assumption that more reasoning leads to better performance. However, emerging evidence suggests that longer responses can sometimes degrade accuracy…

Computation and Language · Computer Science 2025-05-02 Jinyan Su , Jennifer Healey , Preslav Nakov , Claire Cardie

Calibrating Reasoning in Language Models with Internal Consistency

Large language models (LLMs) have demonstrated impressive capabilities in various reasoning tasks, aided by techniques like chain-of-thought prompting that elicits verbalized reasoning. However, LLMs often generate text with obvious…

Artificial Intelligence · Computer Science 2024-12-06 Zhihui Xie , Jizhou Guo , Tong Yu , Shuai Li

Learning to Self-Verify Makes Language Models Better Reasoners

Recent large language models (LLMs) achieve strong performance in generating promising reasoning paths for complex tasks. However, despite powerful generation ability, LLMs remain weak at verifying their own answers, revealing a persistent…

Computation and Language · Computer Science 2026-02-10 Yuxin Chen , Yu Wang , Yi Zhang , Ziang Ye , Zhengzhou Cai , Yaorui Shi , Qi Gu , Hui Su , Xunliang Cai , Xiang Wang , An Zhang , Tat-Seng Chua

Language Models can Self-Lengthen to Generate Long Texts

Recent advancements in Large Language Models (LLMs) have significantly enhanced their ability to process long contexts, yet a notable gap remains in generating long, aligned outputs. This limitation stems from a training gap where…

Computation and Language · Computer Science 2024-11-01 Shanghaoran Quan , Tianyi Tang , Bowen Yu , An Yang , Dayiheng Liu , Bofei Gao , Jianhong Tu , Yichang Zhang , Jingren Zhou , Junyang Lin

When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs

Self-correction is an approach to improving responses from large language models (LLMs) by refining the responses using LLMs during inference. Prior work has proposed various self-correction frameworks using different sources of feedback,…

Computation and Language · Computer Science 2024-12-05 Ryo Kamoi , Yusen Zhang , Nan Zhang , Jiawei Han , Rui Zhang

PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier

Large Language Models (LLMs) have demonstrated impressive capabilities in complex reasoning tasks, yet they still struggle to reliably verify the correctness of their own outputs. Existing solutions to this verification challenge often…

Computation and Language · Computer Science 2025-06-13 Yuhua Jiang , Yuwen Xiong , Yufeng Yuan , Chao Xin , Wenyuan Xu , Yu Yue , Qianchuan Zhao , Lin Yan