Related papers: ToolComp: A Multi-Tool Reasoning & Process Supervi…

ToolPRMBench: Evaluating and Advancing Process Reward Models for Tool-using Agents

Reward-guided search methods have demonstrated strong potential in enhancing tool-using agents by effectively guiding sampling and exploration over complex action spaces. As a core design, those search methods utilize process reward models…

Artificial Intelligence · Computer Science 2026-01-21 Dawei Li , Yuguang Yao , Zhen Tan , Huan Liu , Ruocheng Guo

Improve Mathematical Reasoning in Language Models by Automated Process Supervision

Complex multi-step reasoning tasks, such as solving mathematical problems or generating code, remain a significant hurdle for even the most advanced large language models (LLMs). Verifying LLM outputs with an Outcome Reward Model (ORM) is a…

Computation and Language · Computer Science 2024-12-13 Liangchen Luo , Yinxiao Liu , Rosanne Liu , Samrat Phatale , Meiqi Guo , Harsh Lara , Yunxuan Li , Lei Shu , Yun Zhu , Lei Meng , Jiao Sun , Abhinav Rastogi

MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision

While Multimodal Large Language Models (MLLMs) have achieved impressive progress in vision-language understanding, they still struggle with complex multi-step reasoning, often producing logically inconsistent or partially correct solutions.…

Artificial Intelligence · Computer Science 2025-06-06 Lingxiao Du , Fanqing Meng , Zongkai Liu , Zhixiang Zhou , Ping Luo , Qiaosheng Zhang , Wenqi Shao

Let's Verify Step by Step

In recent years, large language models have greatly improved in their ability to perform complex multi-step reasoning. However, even state-of-the-art models still regularly produce logical mistakes. To train more reliable models, we can…

Machine Learning · Computer Science 2023-06-01 Hunter Lightman , Vineet Kosaraju , Yura Burda , Harri Edwards , Bowen Baker , Teddy Lee , Jan Leike , John Schulman , Ilya Sutskever , Karl Cobbe

ProcBench: Benchmark for Multi-Step Reasoning and Following Procedure

Reasoning is central to a wide range of intellectual activities, and while the capabilities of large language models (LLMs) continue to advance, their performance in reasoning tasks remains limited. The processes and mechanisms underlying…

Artificial Intelligence · Computer Science 2024-10-07 Ippei Fujisawa , Sensho Nobe , Hiroki Seto , Rina Onda , Yoshiaki Uchida , Hiroki Ikoma , Pei-Chun Chien , Ryota Kanai

GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning

Recent advancements in Large Language Models (LLMs) have shown that it is promising to utilize Process Reward Models (PRMs) as verifiers to enhance the performance of LLMs. However, current PRMs face three key challenges: (1) limited…

Computation and Language · Computer Science 2025-04-08 Jian Zhao , Runze Liu , Kaiyan Zhang , Zhimu Zhou , Junqi Gao , Dong Li , Jiafei Lyu , Zhouyi Qian , Biqing Qi , Xiu Li , Bowen Zhou

Process Reward Models That Think

Step-by-step verifiers -- also known as process reward models (PRMs) -- are a key ingredient for test-time scaling. PRMs require step-level supervision, making them expensive to train. This work aims to build data-efficient PRMs as…

Machine Learning · Computer Science 2025-12-09 Muhammad Khalifa , Rishabh Agarwal , Lajanugen Logeswaran , Jaekyeom Kim , Hao Peng , Moontae Lee , Honglak Lee , Lu Wang

An Efficient and Precise Training Data Construction Framework for Process-supervised Reward Model in Mathematical Reasoning

Enhancing the mathematical reasoning capabilities of Large Language Models (LLMs) is of great scientific and practical significance. Researchers typically employ process-supervised reward models (PRMs) to guide the reasoning process,…

Computation and Language · Computer Science 2025-07-24 Wei Sun , Qianlong Du , Fuwei Cui , Jiajun Zhang

TurnBench-MS: A Benchmark for Evaluating Multi-Turn, Multi-Step Reasoning in Large Language Models

Despite impressive advances in large language models (LLMs), existing benchmarks often focus on single-turn or single-step tasks, failing to capture the kind of iterative reasoning required in real-world settings. To address this…

Computation and Language · Computer Science 2025-11-26 Yiran Zhang , Mo Wang , Xiaoyang Li , Kaixuan Ren , Chencheng Zhu , Usman Naseem

ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools

Integrating external tools into Large Foundation Models (LFMs) has emerged as a promising approach to enhance their problem-solving capabilities. While existing studies have demonstrated strong performance in tool-augmented Visual Question…

Artificial Intelligence · Computer Science 2026-03-05 Shaofeng Yin , Ting Lei , Yang Liu

A Survey of Process Reward Models: From Outcome Signals to Process Supervisions for Large Language Models

Although Large Language Models (LLMs) exhibit advanced reasoning ability, conventional alignment remains largely dominated by outcome reward models (ORMs) that judge only final answers. Process Reward Models(PRMs) address this gap by…

Computation and Language · Computer Science 2026-04-30 Congmin Zheng , Jiachen Zhu , Zhuoying Ou , Yuxiang Chen , Kangning Zhang , Rong Shan , Zeyu Zheng , Mengyue Yang , Jianghao Lin , Yong Yu , Weinan Zhang

TIM-PRM: Verifying multimodal reasoning with Tool-Integrated PRM

Multimodal Large Language Models (MLLMs) have achieved impressive performances in mathematical reasoning, yet they remain vulnerable to visual hallucinations and logical inconsistencies that standard outcome-based supervision fails to…

Artificial Intelligence · Computer Science 2026-01-01 Peng Kuang , Xiangxiang Wang , Wentao Liu , Jian Dong , Kaidi Xu

Unsupervised Process Reward Models

Process Reward Models (PRMs) are a powerful mechanism for steering large language model reasoning by providing fine-grained, step-level supervision. However, this effectiveness comes at a significant cost: PRMs require expert annotations…

Machine Learning · Computer Science 2026-05-12 Artyom Gadetsky , Maxim Kodryan , Siba Smarak Panigrahi , Hang Guo , Maria Brbic

AutoPRM: Automating Procedural Supervision for Multi-Step Reasoning via Controllable Question Decomposition

Recent advancements in large language models (LLMs) have shown promise in multi-step reasoning tasks, yet their reliance on extensive manual labeling to provide procedural feedback remains a significant impediment. To address this…

Computation and Language · Computer Science 2024-02-20 Zhaorun Chen , Zhuokai Zhao , Zhihong Zhu , Ruiqi Zhang , Xiang Li , Bhiksha Raj , Huaxiu Yao

AgentProcessBench: Diagnosing Step-Level Process Quality in Tool-Using Agents

While Large Language Models (LLMs) have evolved into tool-using agents, they remain brittle in long-horizon interactions. Unlike mathematical reasoning where errors are often rectifiable via backtracking, tool-use failures frequently induce…

Artificial Intelligence · Computer Science 2026-03-17 Shengda Fan , Xuyan Ye , Yupeng Huo , Zhi-Yuan Chen , Yiju Guo , Shenzhi Yang , Wenkai Yang , Shuqi Ye , Jingwen Chen , Haotian Chen , Xin Cong , Yankai Lin

The Lessons of Developing Process Reward Models in Mathematical Reasoning

Process Reward Models (PRMs) emerge as a promising approach for process supervision in mathematical reasoning of Large Language Models (LLMs), which aim to identify and mitigate intermediate errors in the reasoning processes. However, the…

Computation and Language · Computer Science 2025-06-06 Zhenru Zhang , Chujie Zheng , Yangzhen Wu , Beichen Zhang , Runji Lin , Bowen Yu , Dayiheng Liu , Jingren Zhou , Junyang Lin

Process Supervision for Chain-of-Thought Reasoning via Monte Carlo Net Information Gain

Multi-step reasoning improves the capabilities of large language models (LLMs) but increases the risk of errors propagating through intermediate steps. Process reward models (PRMs) mitigate this by scoring each step individually, enabling…

Computation and Language · Computer Science 2026-03-19 Corentin Royer , Debarun Bhattacharjya , Gaetano Rossiello , Andrea Giovannini , Mennatallah El-Assady

MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification

Reasoning is an essential capacity for large language models (LLMs) to address complex tasks, where the identification of process errors is vital for improving this ability. Recently, process-level reward models (PRMs) were proposed to…

Artificial Intelligence · Computer Science 2025-03-18 Zhaopan Xu , Pengfei Zhou , Jiaxin Ai , Wangbo Zhao , Kai Wang , Xiaojiang Peng , Wenqi Shao , Hongxun Yao , Kaipeng Zhang

Med-PRM: Medical Reasoning Models with Stepwise, Guideline-verified Process Rewards

Large language models have shown promise in clinical decision making, but current approaches struggle to localize and correct errors at specific steps of the reasoning process. This limitation is critical in medicine, where identifying and…

Computation and Language · Computer Science 2025-09-23 Jaehoon Yun , Jiwoong Sohn , Jungwoo Park , Hyunjae Kim , Xiangru Tang , Yanjun Shao , Yonghoe Koo , Minhyeok Ko , Qingyu Chen , Mark Gerstein , Michael Moor , Jaewoo Kang

ToolTalk: Evaluating Tool-Usage in a Conversational Setting

Large language models (LLMs) have displayed massive improvements in reasoning and decision-making skills and can hold natural conversations with users. Many recent works seek to augment LLM-based assistants with external tools so they can…

Computation and Language · Computer Science 2023-11-21 Nicholas Farn , Richard Shin