Related papers: Scaling Agentic Verifier for Competitive Coding

AgentV-RL: Scaling Reward Modeling with Agentic Verifier

Verifiers have been demonstrated to enhance LLM reasoning via test-time scaling (TTS). Yet, they face significant challenges in complex domains. Error propagation from incorrect intermediate reasoning can lead to false positives for…

Computation and Language · Computer Science 2026-04-20 Jiazheng Zhang , Ziche Fu , Zhiheng Xi , Wenqing Jing , Mingxu Chai , Wei He , Guoqiang Zhang , Chenghao Fan , Chenxin An , Wenxiang Chen , Zhicheng Liu , Haojie Pan , Dingwei Zhu , Tao Gui , Qi Zhang , Xuanjing Huang

Scaling Test-time Compute for LLM Agents

Scaling test time compute has shown remarkable success in improving the reasoning abilities of large language models (LLMs). In this work, we conduct the first systematic exploration of applying test-time scaling methods to language agents…

Artificial Intelligence · Computer Science 2025-06-17 King Zhu , Hanhao Li , Siwei Wu , Tianshun Xing , Dehua Ma , Xiangru Tang , Minghao Liu , Jian Yang , Jiaheng Liu , Yuchen Eleanor Jiang , Changwang Zhang , Chenghua Lin , Jun Wang , Ge Zhang , Wangchunshu Zhou

Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers

By utilizing more computational resources at test-time, large language models (LLMs) can improve without additional training. One common strategy uses verifiers to evaluate candidate outputs. In this work, we propose a novel scaling…

Artificial Intelligence · Computer Science 2025-02-28 Shalev Lifshitz , Sheila A. McIlraith , Yilun Du

Multi-View Encoders for Performance Prediction in LLM-Based Agentic Workflows

Large language models (LLMs) have demonstrated remarkable capabilities across diverse tasks, but optimizing LLM-based agentic systems remains challenging due to the vast search space of agent configurations, prompting strategies, and…

Machine Learning · Computer Science 2026-03-02 Patara Trirat , Wonyong Jeong , Sung Ju Hwang

Automating Formal Verification with Agent-Guided Tree Search

Formal verification offers a path to provably correct software, but writing verified code remains expensive enough that the technique is rarely used in production. Recent large language models can accelerate this work, and recent benchmarks…

Logic in Computer Science · Computer Science 2026-05-28 Leo Yao

Agentic Code Optimization via Compiler-LLM Cooperation

Generating performant executables from high level languages is critical to software performance across a wide range of domains. Modern compilers perform this task by passing code through a series of well-studied optimizations at…

Programming Languages · Computer Science 2026-04-07 Benjamin Mikek , Danylo Vashchilenko , Bryan Lu , Panpan Xu

AI Agentic Programming: A Survey of Techniques, Challenges, and Opportunities

AI agentic programming is an emerging paradigm where large language model (LLM)-based coding agents autonomously plan, execute, and interact with tools such as compilers, debuggers, and version control systems. Unlike conventional code…

Software Engineering · Computer Science 2025-09-16 Huanting Wang , Jingzhi Gong , Huawei Zhang , Jie Xu , Zheng Wang

Scaling Agents via Continual Pre-training

Large language models (LLMs) have evolved into agentic systems capable of autonomous tool use and multi-step reasoning for complex problem-solving. However, post-training approaches building upon general-purpose foundation models…

Computation and Language · Computer Science 2025-09-17 Liangcai Su , Zhen Zhang , Guangyu Li , Zhuo Chen , Chenxi Wang , Maojia Song , Xinyu Wang , Kuan Li , Jialong Wu , Xuanzhong Chen , Zile Qiao , Zhongwang Zhang , Huifeng Yin , Shihao Cai , Runnan Fang , Zhengwei Tao , Wenbiao Yin , Chenxiong Qian , Yong Jiang , Pengjun Xie , Fei Huang , Jingren Zhou

Scaling Test-Time Compute for Agentic Coding

Test-time scaling has become a powerful way to improve large language models. However, existing methods are best suited to short, bounded outputs that can be directly compared, ranked or refined. Long-horizon coding agents violate this…

Software Engineering · Computer Science 2026-04-22 Joongwon Kim , Wannan Yang , Kelvin Niu , Hongming Zhang , Yun Zhu , Eryk Helenowski , Ruan Silva , Zhengxing Chen , Srinivasan Iyer , Manzil Zaheer , Daniel Fried , Hannaneh Hajishirzi , Sanjeev Arora , Gabriel Synnaeve , Ruslan Salakhutdinov , Anirudh Goyal

Trust but Verify! A Survey on Verification Design for Test-time Scaling

Test-time scaling (TTS) has emerged as a new frontier for scaling the performance of Large Language Models. In test-time scaling, by using more computational resources during inference, LLMs can improve their reasoning process and task…

Computation and Language · Computer Science 2025-09-10 V Venktesh , Mandeep Rathee , Avishek Anand

From Static Benchmarks to Dynamic Protocol: Agent-Centric Text Anomaly Detection for Evaluating LLM Reasoning

The evaluation of large language models (LLMs) has predominantly relied on static datasets, which offer limited scalability and fail to capture the evolving reasoning capabilities of recent models. To overcome these limitations, we propose…

Computation and Language · Computer Science 2026-03-02 Seungdong Yoa , Sanghyu Yoon , Suhee Yoon , Dongmin Kim , Ye Seul Sim , Junhyun Lee , Woohyung Lim

Agentic Software Issue Resolution with Large Language Models: A Survey

Software issue resolution aims to address real-world issues in software repositories (e.g., bug fixing and efficiency optimization) based on natural language descriptions provided by users, representing a key aspect of software maintenance.…

Software Engineering · Computer Science 2025-12-30 Zhonghao Jiang , David Lo , Zhongxin Liu

VerifiAgent: a Unified Verification Agent in Language Model Reasoning

Large language models demonstrate remarkable reasoning capabilities but often produce unreliable or incorrect responses. Existing verification methods are typically model-specific or domain-restricted, requiring significant computational…

Computation and Language · Computer Science 2025-08-22 Jiuzhou Han , Wray Buntine , Ehsan Shareghi

Illuminating LLM Coding Agents: Visual Analytics for Deeper Understanding and Enhancement

Coding agents powered by large language models (LLMs) have gained traction for automating code generation through iterative problem-solving with minimal human involvement. Despite the emergence of various frameworks, e.g., LangChain,…

Machine Learning · Computer Science 2025-08-19 Junpeng Wang , Yuzhong Chen , Menghai Pan , Chin-Chia Michael Yeh , Mahashweta Das

Scaling Generative Verifiers For Natural Language Mathematical Proof Verification And Selection

Large language models have achieved remarkable success on final-answer mathematical problems, largely due to the ease of applying reinforcement learning with verifiable rewards. However, the reasoning underlying these solutions is often…

Artificial Intelligence · Computer Science 2025-11-18 Sadegh Mahdavi , Branislav Kisacanin , Shubham Toshniwal , Wei Du , Ivan Moshkov , George Armstrong , Renjie Liao , Christos Thrampoulidis , Igor Gitman

CodeHacker: Automated Test Case Generation for Detecting Vulnerabilities in Competitive Programming Solutions

The evaluation of Large Language Models (LLMs) for code generation relies heavily on the quality and robustness of test cases. However, existing benchmarks often lack coverage for subtle corner cases, allowing incorrect solutions to pass.…

Software Engineering · Computer Science 2026-02-25 Jingwei Shi , Xinxiang Yin , Jing Huang , Jinman Zhao , Shengyu Tao

Position: Agentic Evolution is the Path to Evolving LLMs

As Large Language Models (LLMs) move from curated training sets into open-ended real-world environments, a fundamental limitation emerges: static training cannot keep pace with continual deployment environment change. Scaling training-time…

Artificial Intelligence · Computer Science 2026-03-17 Minhua Lin , Hanqing Lu , Zhan Shi , Bing He , Rui Mao , Zhiwei Zhang , Zongyu Wu , Xianfeng Tang , Hui Liu , Zhenwei Dai , Xiang Zhang , Suhang Wang , Benoit Dumoulin , Jian Pei

Learning to Explore: Scaling Agentic Reasoning via Exploration-Aware Policy Optimization

Recent advancements in agentic test-time scaling allow models to gather environmental feedback before committing to final actions. A key limitation of existing methods is that they typically employ undifferentiated exploration strategies,…

Artificial Intelligence · Computer Science 2026-05-13 Xingyuan Hua , Sheng Yue , Ju Ren

Scoring Verifiers: Evaluating Synthetic Verification for Code and Reasoning

Synthetic verification techniques such as generating test cases and reward modelling are common ways to enhance the coding capabilities of large language models (LLM) beyond predefined tests. Additionally, code verification has recently…

Artificial Intelligence · Computer Science 2025-07-31 Aleksander Ficek , Somshubra Majumdar , Vahid Noroozi , Boris Ginsburg

Towards Agentic Self-Learning LLMs in Search Environment

We study whether self-learning can scale LLM-based agents without relying on human-curated datasets or predefined rule-based rewards. Through controlled experiments in a search-agent setting, we identify two key determinants of scalable…

Artificial Intelligence · Computer Science 2025-10-22 Wangtao Sun , Xiang Cheng , Jialin Fan , Yao Xu , Xing Yu , Shizhu He , Jun Zhao , Kang Liu