Related papers: Towards a Neural Debugger for Python

Debug like a Human: A Large Language Model Debugger via Verifying Runtime Execution Step-by-step

Large language models (LLMs) are leading significant progress in code generation. Beyond one-pass code generation, recent works further integrate unit tests and program verifiers into LLMs to iteratively refine the generated programs.…

Software Engineering · Computer Science 2024-06-12 Li Zhong , Zilong Wang , Jingbo Shang

debug-gym: A Text-Based Environment for Interactive Debugging

Large Language Models (LLMs) are increasingly relied upon for coding tasks, yet in most scenarios it is assumed that all relevant information can be either accessed in context or matches their training data. We posit that LLMs can benefit…

Artificial Intelligence · Computer Science 2025-03-28 Xingdi Yuan , Morgane M Moss , Charbel El Feghali , Chinmay Singh , Darya Moldavskaya , Drew MacPhee , Lucas Caccia , Matheus Pereira , Minseon Kim , Alessandro Sordoni , Marc-Alexandre Côté

Revisit Self-Debugging with Self-Generated Tests for Code Generation

Large language models (LLMs) have shown significant advancements in code generation, but still face challenges on tasks beyond their basic capabilities. Recently, the notion of self-debugging has been proposed to boost the performance of…

Software Engineering · Computer Science 2025-01-23 Xiancai Chen , Zhengwei Tao , Kechi Zhang , Changzhi Zhou , Wanli Gu , Yuanpeng He , Mengdi Zhang , Xunliang Cai , Haiyan Zhao , Zhi Jin

Large Language Model Powered Symbolic Execution

Large Language Models (LLMs) have emerged as a promising alternative to traditional static program analysis methods, such as symbolic execution, offering the ability to reason over code directly without relying on theorem provers or SMT…

Programming Languages · Computer Science 2025-09-22 Yihe Li , Ruijie Meng , Gregory J. Duck

Teaching Large Language Models to Self-Debug

Large language models (LLMs) have achieved impressive performance on code generation. However, for complex programming tasks, generating the correct solution in one go becomes challenging, thus some prior works have designed program repair…

Computation and Language · Computer Science 2023-10-06 Xinyun Chen , Maxwell Lin , Nathanael Schärli , Denny Zhou

A Systematic Approach for Large Language Models Debugging

Large language models (LLMs) have become central to modern AI workflows, powering applications from open-ended text generation to complex agent-based reasoning. However, debugging these models remains a persistent challenge due to their…

Artificial Intelligence · Computer Science 2026-04-28 Basel Shbita , Anna Lisa Gentile , Bing Zhang , Sungeun An , Shailja Thakur , Shubhi Asthana , Yi Zhou , Saptha Surendran , Farhan Ahmed , Rohan Kulkarni , Yuya Jeremy Ong , Chad DeLuca , Hima Patel

NExT: Teaching Large Language Models to Reason about Code Execution

A fundamental skill among human developers is the ability to understand and reason about program execution. As an example, a programmer can mentally simulate code execution in natural language to debug and repair code (aka. rubber duck…

Machine Learning · Computer Science 2024-04-24 Ansong Ni , Miltiadis Allamanis , Arman Cohan , Yinlin Deng , Kensen Shi , Charles Sutton , Pengcheng Yin

Testing Neural Program Analyzers

Deep neural networks have been increasingly used in software engineering and program analysis tasks. They usually take a program and make some predictions about it, e.g., bug prediction. We call these models neural program analyzers. The…

Machine Learning · Computer Science 2021-03-22 Md Rafiqul Islam Rabin , Ke Wang , Mohammad Amin Alipour

Code Execution with Pre-trained Language Models

Code execution is a fundamental aspect of programming language semantics that reflects the exact behavior of the code. However, most pre-trained models for code intelligence ignore the execution trace and only rely on source code and…

Programming Languages · Computer Science 2023-05-10 Chenxiao Liu , Shuai Lu , Weizhu Chen , Daxin Jiang , Alexey Svyatkovskiy , Shengyu Fu , Neel Sundaresan , Nan Duan

ChatDBG: Augmenting Debugging with Large Language Models

Debugging is a critical but challenging task for programmers. This paper proposes ChatDBG, an AI-powered debugging assistant. ChatDBG integrates large language models (LLMs) to significantly enhance the capabilities and user-friendliness of…

Software Engineering · Computer Science 2025-06-23 Kyla H. Levin , Nicolas van Kempen , Emery D. Berger , Stephen N. Freund

Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models

Algorithmic reasoning refers to the ability to understand the complex patterns behind the problem and decompose them into a sequence of reasoning steps towards the solution. Such nature of algorithmic reasoning makes it a challenge for…

Computation and Language · Computer Science 2024-04-04 Hyungjoo Chae , Yeonghyeon Kim , Seungone Kim , Kai Tzu-iunn Ong , Beong-woo Kwak , Moohyeon Kim , Seonghwan Kim , Taeyoon Kwon , Jiwan Chung , Youngjae Yu , Jinyoung Yeo

Leveraging Print Debugging to Improve Code Generation in Large Language Models

Large language models (LLMs) have made significant progress in code generation tasks, but their performance in tackling programming problems with complex data structures and algorithms remains suboptimal. To address this issue, we propose…

Computation and Language · Computer Science 2024-01-11 Xueyu Hu , Kun Kuang , Jiankai Sun , Hongxia Yang , Fei Wu

SGLang: Efficient Execution of Structured Language Model Programs

Large language models (LLMs) are increasingly used for complex tasks that require multiple generation calls, advanced prompting techniques, control flow, and structured inputs/outputs. However, efficient systems are lacking for programming…

Artificial Intelligence · Computer Science 2024-06-07 Lianmin Zheng , Liangsheng Yin , Zhiqiang Xie , Chuyue Sun , Jeff Huang , Cody Hao Yu , Shiyi Cao , Christos Kozyrakis , Ion Stoica , Joseph E. Gonzalez , Clark Barrett , Ying Sheng

Can Large Language Models Solve Path Constraints in Symbolic Execution?

Symbolic execution is an important software analysis technique which benefits downstream tasks such as software testing and debugging. However, several limitations hinder symbolic execution from application on real-world software. One of…

Software Engineering · Computer Science 2025-11-25 Wenhan Wang , Kaibo Liu , Zeyu Sun , An Ran Chen , Ge Li , Gang Huang , Lei Ma

Fault-Aware Neural Code Rankers

Large language models (LLMs) have demonstrated an impressive ability to generate code for various programming tasks. In many instances, LLMs can generate a correct program for a task when given numerous trials. Consequently, a recent trend…

Programming Languages · Computer Science 2022-12-13 Jeevana Priya Inala , Chenglong Wang , Mei Yang , Andres Codas , Mark Encarnación , Shuvendu K Lahiri , Madanlal Musuvathi , Jianfeng Gao

Debugging with Open-Source Large Language Models: An Evaluation

Large language models have shown good potential in supporting software development tasks. This is why more and more developers turn to LLMs (e.g. ChatGPT) to support them in fixing their buggy code. While this can save time and effort, many…

Software Engineering · Computer Science 2024-09-06 Yacine Majdoub , Eya Ben Charrada

AgentStepper: Interactive Debugging of Software Development Agents

Software development agents powered by large language models (LLMs) have shown great promise in automating tasks like environment setup, issue solving, and program repair. Unfortunately, understanding and debugging such agents remain…

Software Engineering · Computer Science 2026-02-09 Robert Hutter , Michael Pradel

Mutation Testing via Iterative Large Language Model-Driven Scientific Debugging

Large Language Models (LLMs) can generate plausible test code. Intuitively they generate this by imitating tests seen in their training data, rather than reasoning about execution semantics. However, such reasoning is important when…

Software Engineering · Computer Science 2025-03-12 Philipp Straubinger , Marvin Kreis , Stephan Lukasczyk , Gordon Fraser

VDebugger: Harnessing Execution Feedback for Debugging Visual Programs

Visual programs are executable code generated by large language models to address visual reasoning problems. They decompose complex questions into multiple reasoning steps and invoke specialized models for each step to solve the problems.…

Computation and Language · Computer Science 2024-10-07 Xueqing Wu , Zongyu Lin , Songyan Zhao , Te-Lin Wu , Pan Lu , Nanyun Peng , Kai-Wei Chang

LExecutor: Learning-Guided Execution

Executing code is essential for various program analysis tasks, e.g., to detect bugs that manifest through exceptions or to obtain execution traces for further dynamic analysis. However, executing an arbitrary piece of code is often…

Software Engineering · Computer Science 2023-11-13 Beatriz Souza , Michael Pradel