English

REDO: Execution-Free Runtime Error Detection for COding Agents

Software Engineering 2024-10-15 v1 Artificial Intelligence

Abstract

As LLM-based agents exhibit exceptional capabilities in addressing complex problems, there is a growing focus on developing coding agents to tackle increasingly sophisticated tasks. Despite their promising performance, these coding agents often produce programs or modifications that contain runtime errors, which can cause code failures and are difficult for static analysis tools to detect. Enhancing the ability of coding agents to statically identify such errors could significantly improve their overall performance. In this work, we introduce Execution-free Runtime Error Detection for COding Agents (REDO), a method that integrates LLMs with static analysis tools to detect runtime errors for coding agents, without code execution. Additionally, we propose a benchmark task, SWE-Bench-Error-Detection (SWEDE), based on SWE-Bench (lite), to evaluate error detection in repository-level problems with complex external dependencies. Finally, through both quantitative and qualitative analyses across various error detection tasks, we demonstrate that REDO outperforms current state-of-the-art methods by achieving a 11.0% higher accuracy and 9.1% higher weighted F1 score; and provide insights into the advantages of incorporating LLMs for error detection.

Keywords

Cite

@article{arxiv.2410.09117,
  title  = {REDO: Execution-Free Runtime Error Detection for COding Agents},
  author = {Shou Li and Andrey Kan and Laurent Callot and Bhavana Bhasker and Muhammad Shihab Rashid and Timothy B Esler},
  journal= {arXiv preprint arXiv:2410.09117},
  year   = {2024}
}

Comments

27 pages, 13 figures, 6 tables

R2 v1 2026-06-28T19:18:17.784Z