English
Related papers

Related papers: Characterizing, Evaluating, and Optimizing Complex…

200 papers

Reinforcement learning (RL) has recently become the dominant paradigm for strengthening the reasoning abilities of large language models (LLMs). Yet the rule-based reward functions commonly used on mathematical or programming benchmarks…

Artificial Intelligence · Computer Science 2025-09-09 Haoyang He , Zihua Rong , Kun Ji , Chenyang Li , Qing Huang , Chong Xia , Lan Yang , Honggang Zhang

Improving the multi-step reasoning ability of Large Language Models (LLMs) is a critical yet challenging task. The dominant paradigm, outcome-supervised reinforcement learning (RLVR), rewards only correct final answers, often propagating…

Artificial Intelligence · Computer Science 2025-10-14 Beining Wang , Weihang Su , Hongtao Tian , Tao Yang , Yujia Zhou , Ting Yao , Qingyao Ai , Yiqun Liu

Reward models (RMs) play a critical role in enhancing the reasoning performance of LLMs. For example, they can provide training signals to finetune LLMs during reinforcement learning (RL) and help select the best answer from multiple…

Computation and Language · Computer Science 2025-10-06 Qiyuan Liu , Hao Xu , Xuhong Chen , Wei Chen , Yee Whye Teh , Ning Miao

Large reasoning models (LRMs) have recently shown promise in solving complex math problems when optimized with Reinforcement Learning (RL). But conventional approaches rely on outcome-only rewards that provide sparse feedback, resulting in…

Machine Learning · Computer Science 2025-08-01 Tao He , Rongchuan Mu , Lizi Liao , Yixin Cao , Ming Liu , Bing Qin

Generative Reward Models (GRMs) provide greater flexibility than scalar reward models in capturing human preferences, but their effectiveness is limited by poor reasoning capabilities. This often results in incomplete or overly speculative…

Computation and Language · Computer Science 2025-06-23 Bin Chen , Xinzge Gao , Chuanrui Hu , Penghang Yu , Hua Zhang , Bing-Kun Bao

Large Reasoning Models (LRMs) still exhibit large performance gaps between English and other languages, yet much current work assumes these gaps can be closed simply by making reasoning in every language resemble English reasoning. This…

Computation and Language · Computer Science 2026-04-07 Dayeon Ki , Kevin Duh , Marine Carpuat

Recent generations of language models have introduced Large Reasoning Models (LRMs) that generate detailed thinking processes before providing answers. While these models demonstrate improved performance on reasoning benchmarks, their…

Artificial Intelligence · Computer Science 2025-11-21 Parshin Shojaee , Iman Mirzadeh , Keivan Alizadeh , Maxwell Horton , Samy Bengio , Mehrdad Farajtabar

Reinforcement learning-based retrieval-augmented generation (RAG) methods enhance the reasoning abilities of large language models (LLMs). However, most rely only on final-answer rewards, overlooking intermediate reasoning quality. This…

Computation and Language · Computer Science 2025-08-07 Jie He , Victor Gutiérrez-Basulto , Jeff Z. Pan

Recent studies increasingly explore Large Language Models (LLMs) as a new paradigm for recommendation systems due to their scalability and world knowledge. However, existing work has three key limitations: (1) most efforts focus on…

The field of Language Reasoning Models (LRMs) has been very active over the past few years with advances in training and inference techniques enabling LRMs to reason longer, and more accurately. However, a growing body of studies show that…

Computation and Language · Computer Science 2026-04-24 Yannis Belkhiter , Seshu Tirupathi , Giulio Zizzo , John D. Kelleher

Large Language Models (LLMs) consistently benefit from scaled Chain-of-Thought (CoT) reasoning, but also suffer from heavy computational overhead. To address this issue, efficient reasoning aims to incentivize short yet accurate thinking…

Computation and Language · Computer Science 2026-03-23 Taiqiang Wu , Zenan Xu , Bo Zhou , Ngai Wong

Logical reasoning is a critical benchmark for evaluating the capabilities of large language models (LLMs), as it reflects their ability to derive valid conclusions from given premises. While the combination of test-time scaling with…

Computation and Language · Computer Science 2025-08-28 Ramya Keerthy Thatikonda , Wray Buntine , Ehsan Shareghi

Reward modeling is essential for aligning large language models with human preferences through reinforcement learning. To provide accurate reward signals, a reward model (RM) should stimulate deep thinking and conduct interpretable…

Computation and Language · Computer Science 2026-03-09 Xiusi Chen , Gaotang Li , Ziqi Wang , Bowen Jin , Cheng Qian , Yu Wang , Hongru Wang , Yu Zhang , Denghui Zhang , Tong Zhang , Hanghang Tong , Heng Ji

Reward models play a critical role in guiding large language models toward outputs that align with human expectations. However, an open challenge remains in effectively utilizing test-time compute to enhance reward model performance. In…

Computation and Language · Computer Science 2025-05-21 Jiaxin Guo , Zewen Chi , Li Dong , Qingxiu Dong , Xun Wu , Shaohan Huang , Furu Wei

Recent advances in large language models (LLMs) have shown that test-time scaling can substantially improve model performance on complex tasks, particularly in the coding domain. Under this paradigm, models use a larger token budget during…

Artificial Intelligence · Computer Science 2026-04-21 Jiaxin Fang , Runyuan He , Sahil Bhatia , Neel Gajare , Alvin Cheung

Training Large Language Models (LLMs) for chain-of-thought reasoning presents a significant challenge: supervised fine-tuning on a single "golden" rationale hurts generalization as it penalizes equally valid alternatives, whereas…

Computation and Language · Computer Science 2025-11-14 Mingye Zhu , Yi Liu , Zheren Fu , Quan Wang , Yongdong Zhang

Evaluating large language models (LLMs) on final-answer correctness is the dominant paradigm. This approach, however, provides a coarse signal for model improvement and overlooks the quality of the underlying reasoning process. We argue…

Artificial Intelligence · Computer Science 2025-10-24 Heejin Do , Jaehui Hwang , Dongyoon Han , Seong Joon Oh , Sangdoo Yun

Recent studies show that Large Language Models (LLMs) achieve strong reasoning capabilities through supervised fine-tuning or reinforcement learning. However, a key approach, the Process Reward Model (PRM), suffers from reward hacking,…

Computation and Language · Computer Science 2026-04-10 Teng Wang , Zhangyi Jiang , Zhenqi He , Shenyang Tong , Wenhan Yang , Yanan Zheng , Zeyu Li , Zifan He , Hailei Gong , Zewen Ye , Shengjie Ma , Jianping Zhang

Although Large Language Models (LLMs) exhibit advanced reasoning ability, conventional alignment remains largely dominated by outcome reward models (ORMs) that judge only final answers. Process Reward Models(PRMs) address this gap by…

Computation and Language · Computer Science 2026-04-30 Congmin Zheng , Jiachen Zhu , Zhuoying Ou , Yuxiang Chen , Kangning Zhang , Rong Shan , Zeyu Zheng , Mengyue Yang , Jianghao Lin , Yong Yu , Weinan Zhang

Large language models (LLMs) have exhibited extraordinary performance in a variety of tasks while it remains challenging for them to solve complex multi-step tasks as agents. In practice, agents sensitive to the outcome of certain key steps…

Artificial Intelligence · Computer Science 2025-05-28 Zilong Wang , Jingfeng Yang , Sreyashi Nag , Samarth Varshney , Xianfeng Tang , Haoming Jiang , Jingbo Shang , Sheikh Muhammad Sarwar
‹ Prev 1 2 3 10 Next ›