English
Related papers

Related papers: ToolComp: A Multi-Tool Reasoning & Process Supervi…

200 papers

Reward-guided search methods have demonstrated strong potential in enhancing tool-using agents by effectively guiding sampling and exploration over complex action spaces. As a core design, those search methods utilize process reward models…

Artificial Intelligence · Computer Science 2026-01-21 Dawei Li , Yuguang Yao , Zhen Tan , Huan Liu , Ruocheng Guo

Complex multi-step reasoning tasks, such as solving mathematical problems or generating code, remain a significant hurdle for even the most advanced large language models (LLMs). Verifying LLM outputs with an Outcome Reward Model (ORM) is a…

Computation and Language · Computer Science 2024-12-13 Liangchen Luo , Yinxiao Liu , Rosanne Liu , Samrat Phatale , Meiqi Guo , Harsh Lara , Yunxuan Li , Lei Shu , Yun Zhu , Lei Meng , Jiao Sun , Abhinav Rastogi

While Multimodal Large Language Models (MLLMs) have achieved impressive progress in vision-language understanding, they still struggle with complex multi-step reasoning, often producing logically inconsistent or partially correct solutions.…

Artificial Intelligence · Computer Science 2025-06-06 Lingxiao Du , Fanqing Meng , Zongkai Liu , Zhixiang Zhou , Ping Luo , Qiaosheng Zhang , Wenqi Shao

In recent years, large language models have greatly improved in their ability to perform complex multi-step reasoning. However, even state-of-the-art models still regularly produce logical mistakes. To train more reliable models, we can…

Reasoning is central to a wide range of intellectual activities, and while the capabilities of large language models (LLMs) continue to advance, their performance in reasoning tasks remains limited. The processes and mechanisms underlying…

Artificial Intelligence · Computer Science 2024-10-07 Ippei Fujisawa , Sensho Nobe , Hiroki Seto , Rina Onda , Yoshiaki Uchida , Hiroki Ikoma , Pei-Chun Chien , Ryota Kanai

Recent advancements in Large Language Models (LLMs) have shown that it is promising to utilize Process Reward Models (PRMs) as verifiers to enhance the performance of LLMs. However, current PRMs face three key challenges: (1) limited…

Computation and Language · Computer Science 2025-04-08 Jian Zhao , Runze Liu , Kaiyan Zhang , Zhimu Zhou , Junqi Gao , Dong Li , Jiafei Lyu , Zhouyi Qian , Biqing Qi , Xiu Li , Bowen Zhou

Step-by-step verifiers -- also known as process reward models (PRMs) -- are a key ingredient for test-time scaling. PRMs require step-level supervision, making them expensive to train. This work aims to build data-efficient PRMs as…

Machine Learning · Computer Science 2025-12-09 Muhammad Khalifa , Rishabh Agarwal , Lajanugen Logeswaran , Jaekyeom Kim , Hao Peng , Moontae Lee , Honglak Lee , Lu Wang

Enhancing the mathematical reasoning capabilities of Large Language Models (LLMs) is of great scientific and practical significance. Researchers typically employ process-supervised reward models (PRMs) to guide the reasoning process,…

Computation and Language · Computer Science 2025-07-24 Wei Sun , Qianlong Du , Fuwei Cui , Jiajun Zhang

Despite impressive advances in large language models (LLMs), existing benchmarks often focus on single-turn or single-step tasks, failing to capture the kind of iterative reasoning required in real-world settings. To address this…

Computation and Language · Computer Science 2025-11-26 Yiran Zhang , Mo Wang , Xiaoyang Li , Kaixuan Ren , Chencheng Zhu , Usman Naseem

Integrating external tools into Large Foundation Models (LFMs) has emerged as a promising approach to enhance their problem-solving capabilities. While existing studies have demonstrated strong performance in tool-augmented Visual Question…

Artificial Intelligence · Computer Science 2026-03-05 Shaofeng Yin , Ting Lei , Yang Liu

Although Large Language Models (LLMs) exhibit advanced reasoning ability, conventional alignment remains largely dominated by outcome reward models (ORMs) that judge only final answers. Process Reward Models(PRMs) address this gap by…

Computation and Language · Computer Science 2026-04-30 Congmin Zheng , Jiachen Zhu , Zhuoying Ou , Yuxiang Chen , Kangning Zhang , Rong Shan , Zeyu Zheng , Mengyue Yang , Jianghao Lin , Yong Yu , Weinan Zhang

Multimodal Large Language Models (MLLMs) have achieved impressive performances in mathematical reasoning, yet they remain vulnerable to visual hallucinations and logical inconsistencies that standard outcome-based supervision fails to…

Artificial Intelligence · Computer Science 2026-01-01 Peng Kuang , Xiangxiang Wang , Wentao Liu , Jian Dong , Kaidi Xu

Process Reward Models (PRMs) are a powerful mechanism for steering large language model reasoning by providing fine-grained, step-level supervision. However, this effectiveness comes at a significant cost: PRMs require expert annotations…

Machine Learning · Computer Science 2026-05-12 Artyom Gadetsky , Maxim Kodryan , Siba Smarak Panigrahi , Hang Guo , Maria Brbic

Recent advancements in large language models (LLMs) have shown promise in multi-step reasoning tasks, yet their reliance on extensive manual labeling to provide procedural feedback remains a significant impediment. To address this…

Computation and Language · Computer Science 2024-02-20 Zhaorun Chen , Zhuokai Zhao , Zhihong Zhu , Ruiqi Zhang , Xiang Li , Bhiksha Raj , Huaxiu Yao

While Large Language Models (LLMs) have evolved into tool-using agents, they remain brittle in long-horizon interactions. Unlike mathematical reasoning where errors are often rectifiable via backtracking, tool-use failures frequently induce…

Artificial Intelligence · Computer Science 2026-03-17 Shengda Fan , Xuyan Ye , Yupeng Huo , Zhi-Yuan Chen , Yiju Guo , Shenzhi Yang , Wenkai Yang , Shuqi Ye , Jingwen Chen , Haotian Chen , Xin Cong , Yankai Lin

Process Reward Models (PRMs) emerge as a promising approach for process supervision in mathematical reasoning of Large Language Models (LLMs), which aim to identify and mitigate intermediate errors in the reasoning processes. However, the…

Computation and Language · Computer Science 2025-06-06 Zhenru Zhang , Chujie Zheng , Yangzhen Wu , Beichen Zhang , Runji Lin , Bowen Yu , Dayiheng Liu , Jingren Zhou , Junyang Lin

Multi-step reasoning improves the capabilities of large language models (LLMs) but increases the risk of errors propagating through intermediate steps. Process reward models (PRMs) mitigate this by scoring each step individually, enabling…

Computation and Language · Computer Science 2026-03-19 Corentin Royer , Debarun Bhattacharjya , Gaetano Rossiello , Andrea Giovannini , Mennatallah El-Assady

Reasoning is an essential capacity for large language models (LLMs) to address complex tasks, where the identification of process errors is vital for improving this ability. Recently, process-level reward models (PRMs) were proposed to…

Artificial Intelligence · Computer Science 2025-03-18 Zhaopan Xu , Pengfei Zhou , Jiaxin Ai , Wangbo Zhao , Kai Wang , Xiaojiang Peng , Wenqi Shao , Hongxun Yao , Kaipeng Zhang

Large language models have shown promise in clinical decision making, but current approaches struggle to localize and correct errors at specific steps of the reasoning process. This limitation is critical in medicine, where identifying and…

Large language models (LLMs) have displayed massive improvements in reasoning and decision-making skills and can hold natural conversations with users. Many recent works seek to augment LLM-based assistants with external tools so they can…

Computation and Language · Computer Science 2023-11-21 Nicholas Farn , Richard Shin
‹ Prev 1 2 3 10 Next ›