Related papers: Characterizing, Evaluating, and Optimizing Complex…

Rethinking Reasoning Quality in Large Language Models through Enhanced Chain-of-Thought via RL

Reinforcement learning (RL) has recently become the dominant paradigm for strengthening the reasoning abilities of large language models (LLMs). Yet the rule-based reward functions commonly used on mathematical or programming benchmarks…

Artificial Intelligence · Computer Science 2025-09-09 Haoyang He , Zihua Rong , Kun Ji , Chenyang Li , Qing Huang , Chong Xia , Lan Yang , Honggang Zhang

From <Answer> to <Think>: Multidimensional Supervision of Reasoning Process for LLM Optimization

Improving the multi-step reasoning ability of Large Language Models (LLMs) is a critical yet challenging task. The dominant paradigm, outcome-supervised reinforcement learning (RLVR), rewards only correct final answers, often propagating…

Artificial Intelligence · Computer Science 2025-10-14 Beining Wang , Weihang Su , Hongtao Tian , Tao Yang , Yujia Zhou , Ting Yao , Qingyao Ai , Yiqun Liu

Enhancing Large Language Model Reasoning with Reward Models: An Analytical Survey

Reward models (RMs) play a critical role in enhancing the reasoning performance of LLMs. For example, they can provide training signals to finetune LLMs during reinforcement learning (RL) and help select the best answer from multiple…

Computation and Language · Computer Science 2025-10-06 Qiyuan Liu , Hao Xu , Xuhong Chen , Wei Chen , Yee Whye Teh , Ning Miao

Good Learners Think Their Thinking: Generative PRM Makes Large Reasoning Model More Efficient Math Learner

Large reasoning models (LRMs) have recently shown promise in solving complex math problems when optimized with Reinforcement Learning (RL). But conventional approaches rely on outcome-only rewards that provide sparse feedback, resulting in…

Machine Learning · Computer Science 2025-08-01 Tao He , Rongchuan Mu , Lizi Liao , Yixin Cao , Ming Liu , Bing Qin

ReasonGRM: Enhancing Generative Reward Models through Large Reasoning Models

Generative Reward Models (GRMs) provide greater flexibility than scalar reward models in capturing human preferences, but their effectiveness is limited by poor reasoning capabilities. This often results in incomplete or overly speculative…

Computation and Language · Computer Science 2025-06-23 Bin Chen , Xinzge Gao , Chuanrui Hu , Penghang Yu , Hua Zhang , Bing-Kun Bao

What Makes Good Multilingual Reasoning? Disentangling Reasoning Traces with Measurable Features

Large Reasoning Models (LRMs) still exhibit large performance gaps between English and other languages, yet much current work assumes these gaps can be closed simply by making reasoning in every language resemble English reasoning. This…

Computation and Language · Computer Science 2026-04-07 Dayeon Ki , Kevin Duh , Marine Carpuat

The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

Recent generations of language models have introduced Large Reasoning Models (LRMs) that generate detailed thinking processes before providing answers. While these models demonstrate improved performance on reasoning benchmarks, their…

Artificial Intelligence · Computer Science 2025-11-21 Parshin Shojaee , Iman Mirzadeh , Keivan Alizadeh , Maxwell Horton , Samy Bengio , Mehrdad Farajtabar

From Sufficiency to Reflection: Reinforcement-Guided Thinking Quality in Retrieval-Augmented Reasoning for LLMs

Reinforcement learning-based retrieval-augmented generation (RAG) methods enhance the reasoning abilities of large language models (LLMs). However, most rely only on final-answer rewards, overlooking intermediate reasoning quality. This…

Computation and Language · Computer Science 2025-08-07 Jie He , Victor Gutiérrez-Basulto , Jeff Z. Pan

Generative Reasoning Re-ranker

Recent studies increasingly explore Large Language Models (LLMs) as a new paradigm for recommendation systems due to their scalability and world knowledge. However, existing work has three key limitations: (1) most efforts focus on…

Information Retrieval · Computer Science 2026-02-24 Mingfu Liang , Yufei Li , Jay Xu , Kavosh Asadi , Xi Liu , Shuo Gu , Kaushik Rangadurai , Frank Shyu , Shuaiwen Wang , Song Yang , Zhijing Li , Jiang Liu , Mengying Sun , Fei Tian , Xiaohan Wei , Chonglin Sun , Jacob Tao , Shike Mei , Wenlin Chen , Santanu Kolay , Sandeep Pandey , Hamed Firooz , Luke Simon

TRACES: Tagging Reasoning Steps for Adaptive Cost-Efficient Early-Stopping

The field of Language Reasoning Models (LRMs) has been very active over the past few years with advances in training and inference techniques enabling LRMs to reason longer, and more accurately. However, a growing body of studies show that…

Computation and Language · Computer Science 2026-04-24 Yannis Belkhiter , Seshu Tirupathi , Giulio Zizzo , John D. Kelleher

The Art of Efficient Reasoning: Data, Reward, and Optimization

Large Language Models (LLMs) consistently benefit from scaled Chain-of-Thought (CoT) reasoning, but also suffer from heavy computational overhead. To address this issue, efficient reasoning aims to incentivize short yet accurate thinking…

Computation and Language · Computer Science 2026-03-23 Taiqiang Wu , Zenan Xu , Bo Zhou , Ngai Wong

Logical Reasoning with Outcome Reward Models for Test-Time Scaling

Logical reasoning is a critical benchmark for evaluating the capabilities of large language models (LLMs), as it reflects their ability to derive valid conclusions from given premises. While the combination of test-time scaling with…

Computation and Language · Computer Science 2025-08-28 Ramya Keerthy Thatikonda , Wray Buntine , Ehsan Shareghi

RM-R1: Reward Modeling as Reasoning

Reward modeling is essential for aligning large language models with human preferences through reinforcement learning. To provide accurate reward signals, a reward model (RM) should stimulate deep thinking and conduct interpretable…

Computation and Language · Computer Science 2026-03-09 Xiusi Chen , Gaotang Li , Ziqi Wang , Bowen Jin , Cheng Qian , Yu Wang , Hongru Wang , Yu Zhang , Denghui Zhang , Tong Zhang , Hanghang Tong , Heng Ji

Reward Reasoning Model

Reward models play a critical role in guiding large language models toward outputs that align with human expectations. However, an open challenge remains in effectively utilizing test-time compute to enhance reward model performance. In…

Computation and Language · Computer Science 2025-05-21 Jiaxin Guo , Zewen Chi , Li Dong , Qingxiu Dong , Xun Wu , Shaohan Huang , Furu Wei

Playing Psychic: Using Thought Trees to Predict Reasoning Models Accuracy on Coding Tasks

Recent advances in large language models (LLMs) have shown that test-time scaling can substantially improve model performance on complex tasks, particularly in the coding domain. Under this paradigm, models use a larger token budget during…

Artificial Intelligence · Computer Science 2026-04-21 Jiaxin Fang , Runyuan He , Sahil Bhatia , Neel Gajare , Alvin Cheung

In-Token Rationality Optimization: Towards Accurate and Concise LLM Reasoning via Self-Feedback

Training Large Language Models (LLMs) for chain-of-thought reasoning presents a significant challenge: supervised fine-tuning on a single "golden" rationale hurts generalization as it penalizes equally valid alternatives, whereas…

Computation and Language · Computer Science 2025-11-14 Mingye Zhu , Yi Liu , Zheren Fu , Quan Wang , Yongdong Zhang

What Defines Good Reasoning in LLMs? Dissecting Reasoning Steps with Multi-Aspect Evaluation

Evaluating large language models (LLMs) on final-answer correctness is the dominant paradigm. This approach, however, provides a coarse signal for model improvement and overlooks the quality of the underlying reasoning process. We argue…

Artificial Intelligence · Computer Science 2025-10-24 Heejin Do , Jaehui Hwang , Dongyoon Han , Seong Joon Oh , Sangdoo Yun

Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models

Recent studies show that Large Language Models (LLMs) achieve strong reasoning capabilities through supervised fine-tuning or reinforcement learning. However, a key approach, the Process Reward Model (PRM), suffers from reward hacking,…

Computation and Language · Computer Science 2026-04-10 Teng Wang , Zhangyi Jiang , Zhenqi He , Shenyang Tong , Wenhan Yang , Yanan Zheng , Zeyu Li , Zifan He , Hailei Gong , Zewen Ye , Shengjie Ma , Jianping Zhang

A Survey of Process Reward Models: From Outcome Signals to Process Supervisions for Large Language Models

Although Large Language Models (LLMs) exhibit advanced reasoning ability, conventional alignment remains largely dominated by outcome reward models (ORMs) that judge only final answers. Process Reward Models(PRMs) address this gap by…

Computation and Language · Computer Science 2026-04-30 Congmin Zheng , Jiachen Zhu , Zhuoying Ou , Yuxiang Chen , Kangning Zhang , Rong Shan , Zeyu Zheng , Mengyue Yang , Jianghao Lin , Yong Yu , Weinan Zhang

RRO: LLM Agent Optimization Through Rising Reward Trajectories

Large language models (LLMs) have exhibited extraordinary performance in a variety of tasks while it remains challenging for them to solve complex multi-step tasks as agents. In practice, agents sensitive to the outcome of certain key steps…

Artificial Intelligence · Computer Science 2025-05-28 Zilong Wang , Jingfeng Yang , Sreyashi Nag , Samarth Varshney , Xianfeng Tang , Haoming Jiang , Jingbo Shang , Sheikh Muhammad Sarwar