Related papers: Agentic Code Reasoning

Towards Verified Code Reasoning by LLMs

While LLM-based agents are able to tackle a wide variety of code reasoning questions, the answers are not always correct. This prevents the agent from being useful in situations where high precision is desired: (1) helping a software…

Software Engineering · Computer Science 2025-11-17 Meghana Sistla , Gogul Balakrishnan , Pat Rondon , José Cambronero , Michele Tufano , Satish Chandra

Efficient Agentic Reasoning Through Self-Regulated Simulative Planning

How should an agent decide when and how to plan? A dominant approach builds agents as reactive policies with adaptive computation (e.g., chain-of-thought), trained end-to-end expecting planning to emerge implicitly. Without control over the…

Artificial Intelligence · Computer Science 2026-05-22 Mingkai Deng , Jinyu Hou , Lara Sá Neves , Varad Pimpalkhute , Taylor W. Killian , Zhengzhong Liu , Eric P. Xing

Agentic Reasoning: A Streamlined Framework for Enhancing LLM Reasoning with Agentic Tools

We introduce Agentic Reasoning, a framework that enhances large language model (LLM) reasoning by integrating external tool-using agents. Agentic Reasoning dynamically leverages web search, code execution, and structured memory to address…

Artificial Intelligence · Computer Science 2025-07-16 Junde Wu , Jiayuan Zhu , Yuyuan Liu , Min Xu , Yueming Jin

LLM-based Agentic Reasoning Frameworks: A Survey from Methods to Scenarios

Recent advances in the intrinsic reasoning capabilities of large language models (LLMs) have given rise to LLM-based agent systems that exhibit near-human performance on a variety of automated tasks. However, although these systems share…

Artificial Intelligence · Computer Science 2025-08-26 Bingxi Zhao , Lin Geng Foo , Ping Hu , Christian Theobalt , Hossein Rahmani , Jun Liu

Agentic Model Checking

Verifying LLM-generated systems code is hard: bugs are prevalent, formal specifications are missing, and safety contracts are encoded implicitly at call sites rather than enforced at function boundaries. We propose agentic model checking, a…

Software Engineering · Computer Science 2026-05-21 Youcheng Sun , Jiawen Liu , Daniel Kroening , Jason Xue

Automating Formal Verification with Agent-Guided Tree Search

Formal verification offers a path to provably correct software, but writing verified code remains expensive enough that the technique is rarely used in production. Recent large language models can accelerate this work, and recent benchmarks…

Logic in Computer Science · Computer Science 2026-05-28 Leo Yao

The Semi-Executable Stack: Agentic Software Engineering and the Expanding Scope of SE

AI-based systems, currently driven largely by LLMs and tool-using agentic harnesses, are increasingly discussed as a possible threat to software engineering. Foundation models get stronger, agents can plan and act across multiple steps, and…

Software Engineering · Computer Science 2026-04-24 Robert Feldt , Per Lenberg , Julian Frattini , Dhasarathy Parthasarathy

GSM-Agent: Understanding Agentic Reasoning Using Controllable Environments

As LLMs are increasingly deployed as agents, agentic reasoning - the ability to combine tool use, especially search, and reasoning - becomes a critical skill. However, it is hard to disentangle agentic reasoning when evaluated in complex…

Artificial Intelligence · Computer Science 2025-10-03 Hanlin Zhu , Tianyu Guo , Song Mei , Stuart Russell , Nikhil Ghosh , Alberto Bietti , Jiantao Jiao

Demystifying Reinforcement Learning in Agentic Reasoning

Recently, the emergence of agentic RL has showcased that RL could also effectively improve the agentic reasoning ability of LLMs, yet the key design principles and optimal practices remain unclear. In this work, we conduct a comprehensive…

Computation and Language · Computer Science 2025-10-14 Zhaochen Yu , Ling Yang , Jiaru Zou , Shuicheng Yan , Mengdi Wang

SemLoc: Structured Grounding of Free-Form LLM Reasoning for Fault Localization

Fault localization identifies program locations responsible for observed failures. Existing techniques rank suspicious code using syntactic spectra--signals derived from execution structure such as statement coverage, control-flow…

Software Engineering · Computer Science 2026-04-01 Zhaorui Yang , Haichao Zhu , Qian Zhang , Rajiv Gupta , Ashish Kundu

Agentic Interpretation: Lattice-Structured Evidence for LLM-Based Program Analysis

Large language models can consult information that fixed static analyzers cannot, such as documentation, current security advisories, version-specific metadata, and informal API contracts. This makes LLMs a compelling option for program…

Software Engineering · Computer Science 2026-05-14 Jacqueline L. Mitchell , Chao Wang

Exploring Code Analysis: Zero-Shot Insights on Syntax and Semantics with LLMs

Code analysis is fundamental in Software Engineering, supporting debugging, optimization, and security assessment. Human developers approach it through syntax parsing, static semantics inference, and dynamic reasoning. Traditional tools are…

Software Engineering · Computer Science 2026-05-22 Wei Ma , Zhihao Lin , Shangqing Liu , Qiang Hu , Ye Liu , Wenhan Wang , Cen Zhang , Liming Nie , Li Li , Yang Liu , Lingxiao Jiang

Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs

Reasoning is a fundamental component of language understanding. Recent prompting techniques, such as chain of thought, have consistently improved LLMs' performance on various reasoning tasks. Nevertheless, there is still little…

Computation and Language · Computer Science 2024-10-01 Haritz Puerto , Martin Tutek , Somak Aditya , Xiaodan Zhu , Iryna Gurevych

Latent State Estimation Helps UI Agents to Reason

A common problem for agents operating in real-world environments is that the response of an environment to their actions may be non-deterministic and observed through noise. This renders environmental state and progress towards completing a…

Artificial Intelligence · Computer Science 2024-05-21 William E Bishop , Alice Li , Christopher Rawles , Oriana Riva

Executable Counterfactuals: Improving LLMs' Causal Reasoning Through Code

Counterfactual reasoning, a hallmark of intelligence, consists of three steps: inferring latent variables from observations (abduction), constructing alternatives (interventions), and predicting their outcomes (prediction). This skill is…

Machine Learning · Computer Science 2025-10-03 Aniket Vashishtha , Qirun Dai , Hongyuan Mei , Amit Sharma , Chenhao Tan , Hao Peng

SemAgent: A Semantics Aware Program Repair Agent

Large Language Models (LLMs) have shown impressive capabilities in downstream software engineering tasks such as Automated Program Repair (APR). In particular, there has been a lot of research on repository-level issue-resolution benchmarks…

Software Engineering · Computer Science 2025-06-23 Anvith Pabba , Alex Mathai , Anindya Chakraborty , Baishakhi Ray

Verify Before You Fix: Agentic Execution Grounding for Trustworthy Cross-Language Code Analysis

Learned classifiers deployed in agentic pipelines face a fundamental reliability problem: predictions are probabilistic inferences, not verified conclusions, and acting on them without grounding in observable evidence leads to compounding…

Software Engineering · Computer Science 2026-04-14 Jugal Gajjar

Agentic Proof Automation: A Case Study

Proof engineering is notoriously labor-intensive: proofs that are straightforward on paper often require lengthy scripts in theorem provers. Recent advances in large language models (LLMs) create new opportunities for proof automation:…

Programming Languages · Computer Science 2026-01-08 Yichen Xu , Martin Odersky

Beyond Resolution Rates: Behavioral Drivers of Coding Agent Success and Failure

Coding agents represent a new paradigm in automated software engineering, combining the reasoning capabilities of Large Language Models (LLMs) with tool-augmented interaction loops. However, coding agents still have severe limitations.…

Software Engineering · Computer Science 2026-04-06 Tural Mehtiyev , Wesley Assunção

SemCoder: Training Code Language Models with Comprehensive Semantics Reasoning

Code Large Language Models (Code LLMs) have excelled at tasks like code completion but often miss deeper semantics such as execution effects and dynamic states. This paper aims to bridge the gap between Code LLMs' reliance on static text…

Computation and Language · Computer Science 2024-11-04 Yangruibo Ding , Jinjun Peng , Marcus J. Min , Gail Kaiser , Junfeng Yang , Baishakhi Ray