Related papers: Debug like a Human: A Large Language Model Debugge…

A Systematic Approach for Large Language Models Debugging

Large language models (LLMs) have become central to modern AI workflows, powering applications from open-ended text generation to complex agent-based reasoning. However, debugging these models remains a persistent challenge due to their…

Artificial Intelligence · Computer Science 2026-04-28 Basel Shbita , Anna Lisa Gentile , Bing Zhang , Sungeun An , Shailja Thakur , Shubhi Asthana , Yi Zhou , Saptha Surendran , Farhan Ahmed , Rohan Kulkarni , Yuya Jeremy Ong , Chad DeLuca , Hima Patel

Teaching Large Language Models to Self-Debug

Large language models (LLMs) have achieved impressive performance on code generation. However, for complex programming tasks, generating the correct solution in one go becomes challenging, thus some prior works have designed program repair…

Computation and Language · Computer Science 2023-10-06 Xinyun Chen , Maxwell Lin , Nathanael Schärli , Denny Zhou

Enhancing LLM Code Generation: A Systematic Evaluation of Multi-Agent Collaboration and Runtime Debugging for Improved Accuracy, Reliability, and Latency

The use of large language models (LLMs) for automated code generation has emerged as a significant focus within AI research. As these pretrained models continue to evolve, their ability to understand and generate complex code structures has…

Software Engineering · Computer Science 2025-05-06 Nazmus Ashrafi , Salah Bouktif , Mohammed Mediani

DePro: Understanding the Role of LLMs in Debugging Competitive Programming Code

Debugging consumes a substantial portion of the software development lifecycle, yet the effectiveness of Large Language Models(LLMs) in this task is not well understood. Competitive programming offers a rich benchmark for such evaluation,…

Software Engineering · Computer Science 2026-03-23 Nabiha Parvez , Tanvin Sarkar Pallab , Mia Mohammad Imran , Tarannum Shaila Zaman

Towards a Neural Debugger for Python

Training large language models (LLMs) on Python execution traces grounds them in code execution and enables the line-by-line execution prediction of whole Python programs, effectively turning them into neural interpreters (FAIR CodeGen Team…

Machine Learning · Computer Science 2026-03-11 Maximilian Beck , Jonas Gehring , Jannik Kossen , Gabriel Synnaeve

RGD: Multi-LLM Based Agent Debugger via Refinement and Generation Guidance

Large Language Models (LLMs) have shown incredible potential in code generation tasks, and recent research in prompt engineering have enhanced LLMs' understanding of textual information. However, ensuring the accuracy of generated code…

Software Engineering · Computer Science 2024-10-04 Haolin Jin , Zechao Sun , Huaming Chen

Precise Debugging Benchmark: Is Your Model Debugging or Regenerating?

Unlike code completion, debugging requires localizing faults and applying targeted edits. We observe that frontier LLMs often regenerate correct but over-edited solutions during debugging. To evaluate how far LLMs are from precise…

Software Engineering · Computer Science 2026-05-19 Wang Bill Zhu , Miaosen Chai , Shangshang Wang , Yejia Liu , Song Bian , Honghua Dong , Willie Neiswanger , Robin Jia

DebugBench: Evaluating Debugging Capability of Large Language Models

Large Language Models (LLMs) have demonstrated exceptional coding capability. However, as another critical component of programming proficiency, the debugging capability of LLMs remains relatively unexplored. Previous evaluations of LLMs'…

Software Engineering · Computer Science 2024-06-07 Runchu Tian , Yining Ye , Yujia Qin , Xin Cong , Yankai Lin , Yinxu Pan , Yesai Wu , Haotian Hui , Weichuan Liu , Zhiyuan Liu , Maosong Sun

ChatDBG: Augmenting Debugging with Large Language Models

Debugging is a critical but challenging task for programmers. This paper proposes ChatDBG, an AI-powered debugging assistant. ChatDBG integrates large language models (LLMs) to significantly enhance the capabilities and user-friendliness of…

Software Engineering · Computer Science 2025-06-23 Kyla H. Levin , Nicolas van Kempen , Emery D. Berger , Stephen N. Freund

HDLdebugger: Streamlining HDL debugging with Large Language Models

In the domain of chip design, Hardware Description Languages (HDLs) play a pivotal role. However, due to the complex syntax of HDLs and the limited availability of online resources, debugging HDL codes remains a difficult and time-intensive…

Hardware Architecture · Computer Science 2024-03-19 Xufeng Yao , Haoyang Li , Tsz Ho Chan , Wenyi Xiao , Mingxuan Yuan , Yu Huang , Lei Chen , Bei Yu

Revisit Self-Debugging with Self-Generated Tests for Code Generation

Large language models (LLMs) have shown significant advancements in code generation, but still face challenges on tasks beyond their basic capabilities. Recently, the notion of self-debugging has been proposed to boost the performance of…

Software Engineering · Computer Science 2025-01-23 Xiancai Chen , Zhengwei Tao , Kechi Zhang , Changzhi Zhou , Wanli Gu , Yuanpeng He , Mengdi Zhang , Xunliang Cai , Haiyan Zhao , Zhi Jin

LLMLOOP: Improving LLM-Generated Code and Tests through Automated Iterative Feedback Loops

Large Language Models (LLMs) are showing remarkable performance in generating source code, yet the generated code often has issues like compilation errors or incorrect code. Researchers and developers often face wasted effort in…

Software Engineering · Computer Science 2026-03-26 Ravin Ravi , Dylan Bradshaw , Stefano Ruberto , Gunel Jahangirova , Valerio Terragni

Planning-Driven Programming: A Large Language Model Programming Workflow

The strong performance of large language models (LLMs) raises extensive discussion on their application to code generation. Recent research suggests continuous program refinements through visible tests to improve code generation accuracy in…

Software Engineering · Computer Science 2025-05-26 Chao Lei , Yanchuan Chang , Nir Lipovetzky , Krista A. Ehinger

Effective Large Language Model Debugging with Best-first Tree Search

Large Language Models (LLMs) show promise in code generation tasks. However, their code-writing abilities are often limited in scope: while they can successfully implement simple functions, they struggle with more complex tasks. A…

Software Engineering · Computer Science 2024-07-30 Jialin Song , Jonathan Raiman , Bryan Catanzaro

MdEval: Massively Multilingual Code Debugging

Code large language models (LLMs) have made significant progress in code debugging by directly generating the correct code based on the buggy code snippet. Programming benchmarks, typically consisting of buggy code snippet and their…

Computation and Language · Computer Science 2025-02-25 Shukai Liu , Linzheng Chai , Jian Yang , Jiajun Shi , He Zhu , Liran Wang , Ke Jin , Wei Zhang , Hualei Zhu , Shuyue Guo , Tao Sun , Jiaheng Liu , Yunlong Duan , Yu Hao , Liqun Yang , Guanglin Niu , Ge Zhang , Zhoujun Li

NL-Debugging: Exploiting Natural Language as an Intermediate Representation for Code Debugging

Debugging is a critical aspect of LLM's coding ability. Early debugging efforts primarily focused on code-level analysis, which often falls short when addressing complex programming errors that require a deeper understanding of algorithmic…

Computation and Language · Computer Science 2025-10-30 Weiming Zhang , Qingyao Li , Xinyi Dai , Jizheng Chen , Kounianhua Du , Weiwen Liu , Yasheng Wang , Ruiming Tang , Yong Yu , Weinan Zhang

From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging

While large language models have made significant strides in code generation, the pass rate of the generated code is bottlenecked on subtle errors, often requiring human intervention to pass tests, especially for complex problems. Existing…

Computation and Language · Computer Science 2025-11-25 Yuling Shi , Songsong Wang , Chengcheng Wan , Min Wang , Xiaodong Gu

Evaluating Diverse Large Language Models for Automatic and General Bug Reproduction

Bug reproduction is a critical developer activity that is also challenging to automate, as bug reports are often in natural language and thus can be difficult to transform to test cases consistently. As a result, existing techniques mostly…

Software Engineering · Computer Science 2023-11-10 Sungmin Kang , Juyeon Yoon , Nargiz Askarbekkyzy , Shin Yoo

Test Case Generation from Bug Reports via Large Language Models: A Cognitive Layered Evaluation Framework

Large Language Models (LLMs) are increasingly applied to automated software testing, yet their ability to generalize beyond memorized patterns and reason about natural language bug reports remains unclear. We present a systematic evaluation…

Software Engineering · Computer Science 2025-10-08 Irtaza Sajid Qureshi , Zhen Ming , Jiang

LLM4EFFI: Leveraging Large Language Models to Enhance Code Efficiency and Correctness

Large Language Models (LLMs), particularly Code LLMs, have demonstrated impressive performance in code generation. Current research primarily focuses on the correctness of generated code, while efficiency remains less explored. Recent works…

Software Engineering · Computer Science 2025-02-27 Tong Ye , Weigang Huang , Xuhong Zhang , Tengfei Ma , Peiyu Liu , Jianwei Yin , Wenhai Wang