English
Related papers

Related papers: Scaling Agentic Verifier for Competitive Coding

200 papers

Verifiers have been demonstrated to enhance LLM reasoning via test-time scaling (TTS). Yet, they face significant challenges in complex domains. Error propagation from incorrect intermediate reasoning can lead to false positives for…

Scaling test time compute has shown remarkable success in improving the reasoning abilities of large language models (LLMs). In this work, we conduct the first systematic exploration of applying test-time scaling methods to language agents…

By utilizing more computational resources at test-time, large language models (LLMs) can improve without additional training. One common strategy uses verifiers to evaluate candidate outputs. In this work, we propose a novel scaling…

Artificial Intelligence · Computer Science 2025-02-28 Shalev Lifshitz , Sheila A. McIlraith , Yilun Du

Large language models (LLMs) have demonstrated remarkable capabilities across diverse tasks, but optimizing LLM-based agentic systems remains challenging due to the vast search space of agent configurations, prompting strategies, and…

Machine Learning · Computer Science 2026-03-02 Patara Trirat , Wonyong Jeong , Sung Ju Hwang

Formal verification offers a path to provably correct software, but writing verified code remains expensive enough that the technique is rarely used in production. Recent large language models can accelerate this work, and recent benchmarks…

Logic in Computer Science · Computer Science 2026-05-28 Leo Yao

Generating performant executables from high level languages is critical to software performance across a wide range of domains. Modern compilers perform this task by passing code through a series of well-studied optimizations at…

Programming Languages · Computer Science 2026-04-07 Benjamin Mikek , Danylo Vashchilenko , Bryan Lu , Panpan Xu

AI agentic programming is an emerging paradigm where large language model (LLM)-based coding agents autonomously plan, execute, and interact with tools such as compilers, debuggers, and version control systems. Unlike conventional code…

Software Engineering · Computer Science 2025-09-16 Huanting Wang , Jingzhi Gong , Huawei Zhang , Jie Xu , Zheng Wang

Large language models (LLMs) have evolved into agentic systems capable of autonomous tool use and multi-step reasoning for complex problem-solving. However, post-training approaches building upon general-purpose foundation models…

Test-time scaling has become a powerful way to improve large language models. However, existing methods are best suited to short, bounded outputs that can be directly compared, ranked or refined. Long-horizon coding agents violate this…

Test-time scaling (TTS) has emerged as a new frontier for scaling the performance of Large Language Models. In test-time scaling, by using more computational resources during inference, LLMs can improve their reasoning process and task…

Computation and Language · Computer Science 2025-09-10 V Venktesh , Mandeep Rathee , Avishek Anand

The evaluation of large language models (LLMs) has predominantly relied on static datasets, which offer limited scalability and fail to capture the evolving reasoning capabilities of recent models. To overcome these limitations, we propose…

Computation and Language · Computer Science 2026-03-02 Seungdong Yoa , Sanghyu Yoon , Suhee Yoon , Dongmin Kim , Ye Seul Sim , Junhyun Lee , Woohyung Lim

Software issue resolution aims to address real-world issues in software repositories (e.g., bug fixing and efficiency optimization) based on natural language descriptions provided by users, representing a key aspect of software maintenance.…

Software Engineering · Computer Science 2025-12-30 Zhonghao Jiang , David Lo , Zhongxin Liu

Large language models demonstrate remarkable reasoning capabilities but often produce unreliable or incorrect responses. Existing verification methods are typically model-specific or domain-restricted, requiring significant computational…

Computation and Language · Computer Science 2025-08-22 Jiuzhou Han , Wray Buntine , Ehsan Shareghi

Coding agents powered by large language models (LLMs) have gained traction for automating code generation through iterative problem-solving with minimal human involvement. Despite the emergence of various frameworks, e.g., LangChain,…

Machine Learning · Computer Science 2025-08-19 Junpeng Wang , Yuzhong Chen , Menghai Pan , Chin-Chia Michael Yeh , Mahashweta Das

Large language models have achieved remarkable success on final-answer mathematical problems, largely due to the ease of applying reinforcement learning with verifiable rewards. However, the reasoning underlying these solutions is often…

The evaluation of Large Language Models (LLMs) for code generation relies heavily on the quality and robustness of test cases. However, existing benchmarks often lack coverage for subtle corner cases, allowing incorrect solutions to pass.…

Software Engineering · Computer Science 2026-02-25 Jingwei Shi , Xinxiang Yin , Jing Huang , Jinman Zhao , Shengyu Tao

As Large Language Models (LLMs) move from curated training sets into open-ended real-world environments, a fundamental limitation emerges: static training cannot keep pace with continual deployment environment change. Scaling training-time…

Artificial Intelligence · Computer Science 2026-03-17 Minhua Lin , Hanqing Lu , Zhan Shi , Bing He , Rui Mao , Zhiwei Zhang , Zongyu Wu , Xianfeng Tang , Hui Liu , Zhenwei Dai , Xiang Zhang , Suhang Wang , Benoit Dumoulin , Jian Pei

Recent advancements in agentic test-time scaling allow models to gather environmental feedback before committing to final actions. A key limitation of existing methods is that they typically employ undifferentiated exploration strategies,…

Artificial Intelligence · Computer Science 2026-05-13 Xingyuan Hua , Sheng Yue , Ju Ren

Synthetic verification techniques such as generating test cases and reward modelling are common ways to enhance the coding capabilities of large language models (LLM) beyond predefined tests. Additionally, code verification has recently…

Artificial Intelligence · Computer Science 2025-07-31 Aleksander Ficek , Somshubra Majumdar , Vahid Noroozi , Boris Ginsburg

We study whether self-learning can scale LLM-based agents without relying on human-curated datasets or predefined rule-based rewards. Through controlled experiments in a search-agent setting, we identify two key determinants of scalable…

Artificial Intelligence · Computer Science 2025-10-22 Wangtao Sun , Xiang Cheng , Jialin Fan , Yao Xu , Xing Yu , Shizhu He , Jun Zhao , Kang Liu
‹ Prev 1 2 3 10 Next ›