Related papers: AlphaMath Almost Zero: Process Supervision without…

Improve Mathematical Reasoning in Language Models by Automated Process Supervision

Complex multi-step reasoning tasks, such as solving mathematical problems or generating code, remain a significant hurdle for even the most advanced large language models (LLMs). Verifying LLM outputs with an Outcome Reward Model (ORM) is a…

Computation and Language · Computer Science 2024-12-13 Liangchen Luo , Yinxiao Liu , Rosanne Liu , Samrat Phatale , Meiqi Guo , Harsh Lara , Yunxuan Li , Lei Shu , Yun Zhu , Lei Meng , Jiao Sun , Abhinav Rastogi

Enhancing Reasoning through Process Supervision with Monte Carlo Tree Search

Large language models (LLMs) have demonstrated their remarkable capacity across a variety of tasks. However, reasoning remains a challenge for LLMs. To improve LLMs' reasoning ability, process supervision has proven to be better than…

Artificial Intelligence · Computer Science 2025-01-06 Shuangtao Li , Shuaihao Dong , Kexin Luan , Xinhan Di , Chaofan Ding

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Despite the impressive capabilities of Large Language Models (LLMs) on various tasks, they still struggle with scenarios that involves complex reasoning and planning. Recent work proposed advanced prompting techniques and the necessity of…

Computation and Language · Computer Science 2024-12-11 Ye Tian , Baolin Peng , Linfeng Song , Lifeng Jin , Dian Yu , Haitao Mi , Dong Yu

SolverLLM: Leveraging Test-Time Scaling for Optimization Problem via LLM-Guided Search

Large Language Models (LLMs) offer promising capabilities for tackling complex reasoning tasks, including optimization problems. However, existing methods either rely on prompt engineering, which leads to poor generalization across problem…

Machine Learning · Computer Science 2025-10-23 Dong Li , Xujiang Zhao , Linlin Yu , Yanchi Liu , Wei Cheng , Zhengzhang Chen , Zhong Chen , Feng Chen , Chen Zhao , Haifeng Chen

No Train Still Gain. Unleash Mathematical Reasoning of Large Language Models with Monte Carlo Tree Search Guided by Energy Function

Large language models (LLMs) demonstrate impressive language understanding and contextual learning abilities, making them suitable for natural language processing (NLP) tasks and complex mathematical reasoning. However, when applied to…

Artificial Intelligence · Computer Science 2023-09-13 Haotian Xu

LLMs Could Autonomously Learn Without External Supervision

In the quest for super-human performance, Large Language Models (LLMs) have traditionally been tethered to human-annotated datasets and predefined training objectives-a process that is both labor-intensive and inherently limited. This paper…

Computation and Language · Computer Science 2024-06-10 Ke Ji , Junying Chen , Anningzhe Gao , Wenya Xie , Xiang Wan , Benyou Wang

MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision

While Multimodal Large Language Models (MLLMs) have achieved impressive progress in vision-language understanding, they still struggle with complex multi-step reasoning, often producing logically inconsistent or partially correct solutions.…

Artificial Intelligence · Computer Science 2025-06-06 Lingxiao Du , Fanqing Meng , Zongkai Liu , Zhixiang Zhou , Ping Luo , Qiaosheng Zhang , Wenqi Shao

Can LLMs Augment Low-Resource Reading Comprehension Datasets? Opportunities and Challenges

Large Language Models (LLMs) have demonstrated impressive zero shot performance on a wide range of NLP tasks, demonstrating the ability to reason and apply commonsense. A relevant application is to use them for creating high quality…

Computation and Language · Computer Science 2024-07-11 Vinay Samuel , Houda Aynaou , Arijit Ghosh Chowdhury , Karthik Venkat Ramanan , Aman Chadha

Enhancing Logical Reasoning in Language Models via Symbolically-Guided Monte Carlo Process Supervision

Large language models (LLMs) have shown strong performance in many reasoning benchmarks. However, recent studies have pointed to memorization, rather than generalization, as one of the leading causes for such performance. LLMs, in fact, are…

Computation and Language · Computer Science 2025-09-19 Xingwei Tan , Marco Valentino , Mahmud Akhter , Maria Liakata , Nikolaos Aletras

Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations

In this paper, we present an innovative process-oriented math process reward model called \textbf{Math-Shepherd}, which assigns a reward score to each step of math problem solutions. The training of Math-Shepherd is achieved using…

Artificial Intelligence · Computer Science 2024-02-20 Peiyi Wang , Lei Li , Zhihong Shao , R. X. Xu , Damai Dai , Yifei Li , Deli Chen , Y. Wu , Zhifang Sui

What Are Step-Level Reward Models Rewarding? Counterintuitive Findings from MCTS-Boosted Mathematical Reasoning

Step-level reward models (SRMs) can significantly enhance mathematical reasoning performance through process supervision or step-level preference alignment based on reinforcement learning. The performance of SRMs is pivotal, as they serve…

Artificial Intelligence · Computer Science 2025-03-11 Yiran Ma , Zui Chen , Tianqiao Liu , Mi Tian , Zhuo Liu , Zitao Liu , Weiqi Luo

Unsupervised Process Reward Models

Process Reward Models (PRMs) are a powerful mechanism for steering large language model reasoning by providing fine-grained, step-level supervision. However, this effectiveness comes at a significant cost: PRMs require expert annotations…

Machine Learning · Computer Science 2026-05-12 Artyom Gadetsky , Maxim Kodryan , Siba Smarak Panigrahi , Hang Guo , Maria Brbic

SELT: Self-Evaluation Tree Search for LLMs with Task Decomposition

While Large Language Models (LLMs) have achieved remarkable success in a wide range of applications, their performance often degrades in complex reasoning tasks. In this work, we introduce SELT (Self-Evaluation LLM Tree Search), a novel…

Computation and Language · Computer Science 2025-06-10 Mengsong Wu , Di Zhang , Yuqiang Li , Dongzhan Zhou , Wenliang Chen

CMCTS: A Constrained Monte Carlo Tree Search Framework for Mathematical Reasoning in Large Language Model

This paper introduces the Constrained Monte Carlo Tree Search (CMCTS) framework to enhance the mathematical reasoning capabilities of Large Language Models (LLM). By incorporating a constrained action space, Process Reward Model (PRM), and…

Computation and Language · Computer Science 2025-06-17 Qingwen Lin , Boyan Xu , Guimin Hu , Zijian Li , Zhifeng Hao , Keli Zhang , Ruichu Cai

SubSearch: Intermediate Rewards for Unsupervised Guided Reasoning in Complex Retrieval

Large language models (LLMs) are probabilistic in nature and perform more reliably when augmented with external information. As complex queries often require multi-step reasoning over the retrieved information, with no clear or…

Information Retrieval · Computer Science 2026-04-10 Roxana Petcu , Evangelos Kanoulas , Maarten de Rijke

LLM-First Search: Self-Guided Exploration of the Solution Space

Large Language Models (LLMs) have demonstrated remarkable improvements in reasoning and planning through increased test-time compute, often by framing problem-solving as a search process. While methods like Monte Carlo Tree Search (MCTS)…

Artificial Intelligence · Computer Science 2025-06-06 Nathan Herr , Tim Rocktäschel , Roberta Raileanu

Learning From Correctness Without Prompting Makes LLM Efficient Reasoner

Large language models (LLMs) have demonstrated outstanding performance across various tasks, yet they still exhibit limitations such as hallucination, unfaithful reasoning, and toxic content. One potential approach to mitigate these issues…

Computation and Language · Computer Science 2024-07-19 Yuxuan Yao , Han Wu , Zhijiang Guo , Biyan Zhou , Jiahui Gao , Sichun Luo , Hanxu Hou , Xiaojin Fu , Linqi Song

Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning

We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process inspired by the successful strategy employed by AlphaZero. Our work leverages Monte…

Artificial Intelligence · Computer Science 2024-06-19 Yuxi Xie , Anirudh Goyal , Wenyue Zheng , Min-Yen Kan , Timothy P. Lillicrap , Kenji Kawaguchi , Michael Shieh

I-MCTS: Enhancing Agentic AutoML via Introspective Monte Carlo Tree Search

Recent advancements in large language models (LLMs) have shown remarkable potential in automating machine learning tasks. However, existing LLM-based agents often struggle with low-diversity and suboptimal code generation. While recent work…

Computation and Language · Computer Science 2026-01-26 Zujie Liang , Feng Wei , Wujiang Xu , Lin Chen , Yuxi Qian , Xinhui Wu

An Efficient and Precise Training Data Construction Framework for Process-supervised Reward Model in Mathematical Reasoning

Enhancing the mathematical reasoning capabilities of Large Language Models (LLMs) is of great scientific and practical significance. Researchers typically employ process-supervised reward models (PRMs) to guide the reasoning process,…

Computation and Language · Computer Science 2025-07-24 Wei Sun , Qianlong Du , Fuwei Cui , Jiajun Zhang