Related papers: REDO: Execution-Free Runtime Error Detection for C…

Cerberus: Multi-Agent Reasoning and Coverage-Guided Exploration for Static Detection of Runtime Errors

In several software development scenarios, it is desirable to detect runtime errors and exceptions in code snippets without actual execution. A typical example is to detect runtime exceptions in online code snippets before integrating them…

Software Engineering · Computer Science 2025-12-29 Hridya Dhulipala , Xiaokai Rong , Tien N. Nguyen

SWE-Adept: An LLM-Based Agentic Framework for Deep Codebase Analysis and Structured Issue Resolution

Large language models (LLMs) exhibit strong performance on self-contained programming tasks. However, they still struggle with repository-level software engineering (SWE), which demands (1) deep codebase navigation with effective context…

Software Engineering · Computer Science 2026-05-27 Kang He , Kaushik Roy

A Self-Improving Coding Agent

Recent advancements in Large Language Models (LLMs) have spurred interest in deploying LLM agents to undertake tasks in the world. LLMs are often deployed in agent systems: code that orchestrates LLM calls and provides them with tools. We…

Artificial Intelligence · Computer Science 2025-05-20 Maxime Robeyns , Martin Szummer , Laurence Aitchison

Understanding Code Agent Behaviour: An Empirical Study of Success and Failure Trajectories

The increasing deployment of Large Language Model (LLM) agents for complex software engineering tasks has created a need to understand their problem-solving behaviours beyond simple success metrics. While these agents demonstrate impressive…

Software Engineering · Computer Science 2025-11-04 Oorja Majgaonkar , Zhiwei Fei , Xiang Li , Federica Sarro , He Ye

SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents

LLM-based agents have shown promising capabilities in a growing range of software engineering (SWE) tasks. However, advancing this field faces two critical challenges. First, high-quality training data is scarce, especially data that…

Software Engineering · Computer Science 2025-11-05 Ibragim Badertdinov , Alexander Golubev , Maksim Nekrashevich , Anton Shevtsov , Simon Karasik , Andrei Andriushchenko , Maria Trofimova , Daria Litvintseva , Boris Yangel

Agentless: Demystifying LLM-based Software Engineering Agents

Recent advancements in large language models (LLMs) have significantly advanced the automation of software development tasks, including code synthesis, program repair, and test generation. More recently, researchers and industry…

Software Engineering · Computer Science 2024-10-30 Chunqiu Steven Xia , Yinlin Deng , Soren Dunn , Lingming Zhang

SWT-Bench: Testing and Validating Real-World Bug-Fixes with Code Agents

Rigorous software testing is crucial for developing and maintaining high-quality code, making automated test generation a promising avenue for both improving software quality and boosting the effectiveness of code generation methods.…

Software Engineering · Computer Science 2025-02-10 Niels Mündler , Mark Niklas Müller , Jingxuan He , Martin Vechev

An Empirical Evaluation of LLM-Based Approaches for Code Vulnerability Detection: RAG, SFT, and Dual-Agent Systems

The rapid advancement of Large Language Models (LLMs) presents new opportunities for automated software vulnerability detection, a crucial task in securing modern codebases. This paper presents a comparative study on the effectiveness of…

Software Engineering · Computer Science 2026-01-05 Md Hasan Saju , Maher Muhtadi , Akramul Azim

SWE-RM: Execution-free Feedback For Software Engineering Agents

Execution-based feedback like unit testing is widely used in the development of coding agents through test-time scaling (TTS) and reinforcement learning (RL). This paradigm requires scalable and reliable collection of unit test cases to…

Computation and Language · Computer Science 2025-12-29 KaShun Shum , Binyuan Hui , Jiawei Chen , Lei Zhang , X. W. , Jiaxi Yang , Yuzhen Huang , Junyang Lin , Junxian He

Concurrent Linguistic Error Detection (CLED): a New Methodology for Error Detection in Large Language Models

The wide adoption of Large language models (LLMs) makes their dependability a pressing concern. Detection of errors is the first step to mitigating their impact on a system and thus, efficient error detection for LLMs is an important issue.…

Artificial Intelligence · Computer Science 2025-09-17 Jinhua Zhu , Javier Conde , Zhen Gao , Pedro Reviriego , Shanshan Liu , Fabrizio Lombardi

SWE-MERA: A Dynamic Benchmark for Agenticly Evaluating Large Language Models on Software Engineering Tasks

The rapid advancement of Large Language Models (LLMs) in software engineering has revealed critical limitations in existing benchmarks, particularly the widely used SWE-bench dataset. Recent studies have uncovered severe data contamination…

Software Engineering · Computer Science 2025-07-18 Pavel Adamenko , Mikhail Ivanov , Aidar Valeev , Rodion Levichev , Pavel Zadorozhny , Ivan Lopatin , Dmitry Babayev , Alena Fenogenova , Valentin Malykh

An Empirical Study of Speculative Decoding on Software Engineering Tasks

Large Language Models (LLMs) have become widely used for Software Engineering (SE) tasks, spanning from function-level code generation to complex repository-level workflows. However, the high latency of autoregressive inference remains a…

Software Engineering · Computer Science 2026-05-05 Yijia Li , Junkai Chen , Xing Hu , Xin Xia

LAMeD: LLM-generated Annotations for Memory Leak Detection

Static analysis tools are widely used to detect software bugs and vulnerabilities but often struggle with scalability and efficiency in complex codebases. Traditional approaches rely on manually crafted annotations -- labeling functions as…

Software Engineering · Computer Science 2025-05-06 Ekaterina Shemetova , Ilya Shenbin , Ivan Smirnov , Anton Alekseev , Alexey Rukhovich , Sergey Nikolenko , Vadim Lomshakov , Irina Piontkovskaya

Beyond Final Code: A Process-Oriented Error Analysis of Software Development Agents in Real-World GitHub Scenarios

AI-driven software development has rapidly advanced with the emergence of software development agents that leverage large language models (LLMs) to tackle complex, repository-level software engineering tasks. These agents go beyond just…

Software Engineering · Computer Science 2026-04-10 Zhi Chen , Wei Ma , Lingxiao Jiang

RedCodeAgent: Automatic Red-teaming Agent against Diverse Code Agents

Code agents have gained widespread adoption due to their strong code generation capabilities and integration with code interpreters, enabling dynamic execution, debugging, and interactive programming capabilities. While these advancements…

Software Engineering · Computer Science 2025-11-12 Chengquan Guo , Chulin Xie , Yu Yang , Zhaorun Chen , Zinan Lin , Xander Davies , Yarin Gal , Dawn Song , Bo Li

Illuminating LLM Coding Agents: Visual Analytics for Deeper Understanding and Enhancement

Coding agents powered by large language models (LLMs) have gained traction for automating code generation through iterative problem-solving with minimal human involvement. Despite the emergence of various frameworks, e.g., LangChain,…

Machine Learning · Computer Science 2025-08-19 Junpeng Wang , Yuzhong Chen , Menghai Pan , Chin-Chia Michael Yeh , Mahashweta Das

Augmenting Large Language Models with Static Code Analysis for Automated Code Quality Improvements

This study examined code issue detection and revision automation by integrating Large Language Models (LLMs) such as OpenAI's GPT-3.5 Turbo and GPT-4o into software development workflows. A static code analysis framework detects issues such…

Software Engineering · Computer Science 2025-06-13 Seyed Moein Abtahi , Akramul Azim

DOCE: Finding the Sweet Spot for Execution-Based Code Generation

Recently, a diverse set of decoding and reranking procedures have been shown effective for LLM-based code generation. However, a comprehensive framework that links and experimentally compares these methods is missing. We address this by…

Computation and Language · Computer Science 2024-10-17 Haau-Sing Li , Patrick Fernandes , Iryna Gurevych , André F. T. Martins

Beyond Resolution Rates: Behavioral Drivers of Coding Agent Success and Failure

Coding agents represent a new paradigm in automated software engineering, combining the reasoning capabilities of Large Language Models (LLMs) with tool-augmented interaction loops. However, coding agents still have severe limitations.…

Software Engineering · Computer Science 2026-04-06 Tural Mehtiyev , Wesley Assunção

Code Reasoning for Software Engineering Tasks: A Survey and A Call to Action

The rise of large language models (LLMs) has led to dramatic improvements across a wide range of natural language tasks. Their performance on certain tasks can be further enhanced by incorporating test-time reasoning techniques. These…

Software Engineering · Computer Science 2026-01-13 Saurabh Pujar , Ira Ceka , Irene Manotas , Gail Kaiser , Baishakhi Ray , Shyam Ramji