Related papers: Teaching Large Language Models to Self-Debug

Leveraging Print Debugging to Improve Code Generation in Large Language Models

Large language models (LLMs) have made significant progress in code generation tasks, but their performance in tackling programming problems with complex data structures and algorithms remains suboptimal. To address this issue, we propose…

Computation and Language · Computer Science 2024-01-11 Xueyu Hu , Kun Kuang , Jiankai Sun , Hongxia Yang , Fei Wu

Debug like a Human: A Large Language Model Debugger via Verifying Runtime Execution Step-by-step

Large language models (LLMs) are leading significant progress in code generation. Beyond one-pass code generation, recent works further integrate unit tests and program verifiers into LLMs to iteratively refine the generated programs.…

Software Engineering · Computer Science 2024-06-12 Li Zhong , Zilong Wang , Jingbo Shang

Revisit Self-Debugging with Self-Generated Tests for Code Generation

Large language models (LLMs) have shown significant advancements in code generation, but still face challenges on tasks beyond their basic capabilities. Recently, the notion of self-debugging has been proposed to boost the performance of…

Software Engineering · Computer Science 2025-01-23 Xiancai Chen , Zhengwei Tao , Kechi Zhang , Changzhi Zhou , Wanli Gu , Yuanpeng He , Mengdi Zhang , Xunliang Cai , Haiyan Zhao , Zhi Jin

Effective Large Language Model Debugging with Best-first Tree Search

Large Language Models (LLMs) show promise in code generation tasks. However, their code-writing abilities are often limited in scope: while they can successfully implement simple functions, they struggle with more complex tasks. A…

Software Engineering · Computer Science 2024-07-30 Jialin Song , Jonathan Raiman , Bryan Catanzaro

Debugging with Open-Source Large Language Models: An Evaluation

Large language models have shown good potential in supporting software development tasks. This is why more and more developers turn to LLMs (e.g. ChatGPT) to support them in fixing their buggy code. While this can save time and effort, many…

Software Engineering · Computer Science 2024-09-06 Yacine Majdoub , Eya Ben Charrada

Can Large Language Models Invent Algorithms to Improve Themselves?: Algorithm Discovery for Recursive Self-Improvement through Reinforcement Learning

Large Language Models (LLMs) have achieved remarkable capabilities, yet their improvement methods remain fundamentally constrained by human design. We present Self-Developing, a framework that enables LLMs to autonomously discover,…

Computation and Language · Computer Science 2025-06-11 Yoichi Ishibashi , Taro Yano , Masafumi Oyamada

Evaluating Diverse Large Language Models for Automatic and General Bug Reproduction

Bug reproduction is a critical developer activity that is also challenging to automate, as bug reports are often in natural language and thus can be difficult to transform to test cases consistently. As a result, existing techniques mostly…

Software Engineering · Computer Science 2023-11-10 Sungmin Kang , Juyeon Yoon , Nargiz Askarbekkyzy , Shin Yoo

DebugBench: Evaluating Debugging Capability of Large Language Models

Large Language Models (LLMs) have demonstrated exceptional coding capability. However, as another critical component of programming proficiency, the debugging capability of LLMs remains relatively unexplored. Previous evaluations of LLMs'…

Software Engineering · Computer Science 2024-06-07 Runchu Tian , Yining Ye , Yujia Qin , Xin Cong , Yankai Lin , Yinxu Pan , Yesai Wu , Haotian Hui , Weichuan Liu , Zhiyuan Liu , Maosong Sun

Mutation Testing via Iterative Large Language Model-Driven Scientific Debugging

Large Language Models (LLMs) can generate plausible test code. Intuitively they generate this by imitating tests seen in their training data, rather than reasoning about execution semantics. However, such reasoning is important when…

Software Engineering · Computer Science 2025-03-12 Philipp Straubinger , Marvin Kreis , Stephan Lukasczyk , Gordon Fraser

LeDex: Training LLMs to Better Self-Debug and Explain Code

In the domain of code generation, self-debugging is crucial. It allows LLMs to refine their generated code based on execution feedback. This is particularly important because generating correct solutions in one attempt proves challenging…

Computation and Language · Computer Science 2025-02-17 Nan Jiang , Xiaopeng Li , Shiqi Wang , Qiang Zhou , Soneya Binta Hossain , Baishakhi Ray , Varun Kumar , Xiaofei Ma , Anoop Deoras

HDLdebugger: Streamlining HDL debugging with Large Language Models

In the domain of chip design, Hardware Description Languages (HDLs) play a pivotal role. However, due to the complex syntax of HDLs and the limited availability of online resources, debugging HDL codes remains a difficult and time-intensive…

Hardware Architecture · Computer Science 2024-03-19 Xufeng Yao , Haoyang Li , Tsz Ho Chan , Wenyi Xiao , Mingxuan Yuan , Yu Huang , Lei Chen , Bei Yu

A Systematic Approach for Large Language Models Debugging

Large language models (LLMs) have become central to modern AI workflows, powering applications from open-ended text generation to complex agent-based reasoning. However, debugging these models remains a persistent challenge due to their…

Artificial Intelligence · Computer Science 2026-04-28 Basel Shbita , Anna Lisa Gentile , Bing Zhang , Sungeun An , Shailja Thakur , Shubhi Asthana , Yi Zhou , Saptha Surendran , Farhan Ahmed , Rohan Kulkarni , Yuya Jeremy Ong , Chad DeLuca , Hima Patel

RepoDebug: Repository-Level Multi-Task and Multi-Language Debugging Evaluation of Large Language Models

Large Language Models (LLMs) have exhibited significant proficiency in code debugging, especially in automatic program repair, which may substantially reduce the time consumption of developers and enhance their efficiency. Significant…

Software Engineering · Computer Science 2025-09-09 Jingjing Liu , Zeming Liu , Zihao Cheng , Mengliang He , Xiaoming Shi , Yuhang Guo , Xiangrong Zhu , Yuanfang Guo , Yunhong Wang , Haifeng Wang

DebugRepair: Enhancing LLM-Based Automated Program Repair via Self-Directed Debugging

Automated Program Repair (APR) has benefited from the code understanding and generation capabilities of Large Language Models (LLMs). Existing feedback-based APR methods iteratively refine candidate patches using test execution feedback and…

Software Engineering · Computer Science 2026-04-22 Linhao Wu , Yifei Pei , Zhen Yang , Kainan Li , Zhonghang Lu , Hao Tan , Xiran Lyu , Jia Li , Yizhou Chen , Pengyu Xue , Kunwu Zheng , Dan Hao

MdEval: Massively Multilingual Code Debugging

Code large language models (LLMs) have made significant progress in code debugging by directly generating the correct code based on the buggy code snippet. Programming benchmarks, typically consisting of buggy code snippet and their…

Computation and Language · Computer Science 2025-02-25 Shukai Liu , Linzheng Chai , Jian Yang , Jiajun Shi , He Zhu , Liran Wang , Ke Jin , Wei Zhang , Hualei Zhu , Shuyue Guo , Tao Sun , Jiaheng Liu , Yunlong Duan , Yu Hao , Liqun Yang , Guanglin Niu , Ge Zhang , Zhoujun Li

Test Case Generation from Bug Reports via Large Language Models: A Cognitive Layered Evaluation Framework

Large Language Models (LLMs) are increasingly applied to automated software testing, yet their ability to generalize beyond memorized patterns and reason about natural language bug reports remains unclear. We present a systematic evaluation…

Software Engineering · Computer Science 2025-10-08 Irtaza Sajid Qureshi , Zhen Ming , Jiang

DolphCoder: Echo-Locating Code Large Language Models with Diverse and Multi-Objective Instruction Tuning

Code Large Language Models (Code LLMs) have demonstrated outstanding performance in code-related tasks. Several instruction tuning approaches have been proposed to boost the code generation performance of pre-trained Code LLMs. In this…

Computation and Language · Computer Science 2024-02-15 Yejie Wang , Keqing He , Guanting Dong , Pei Wang , Weihao Zeng , Muxi Diao , Yutao Mou , Mengdi Zhang , Jingang Wang , Xunliang Cai , Weiran Xu

What's Wrong with Your Code Generated by Large Language Models? An Extensive Study

The increasing development of LLMs in code generation has drawn significant attention among researchers. To enhance LLM-based code generation ability, current efforts are predominantly directed towards collecting high-quality datasets and…

Software Engineering · Computer Science 2025-10-20 Shihan Dou , Haoxiang Jia , Shenxi Wu , Huiyuan Zheng , Muling Wu , Yunbo Tao , Ming Zhang , Mingxu Chai , Jessica Fan , Zhiheng Xi , Rui Zheng , Yueming Wu , Ming Wen , Tao Gui , Qi Zhang , Xipeng Qiu , Xuanjing Huang

Towards a Neural Debugger for Python

Training large language models (LLMs) on Python execution traces grounds them in code execution and enables the line-by-line execution prediction of whole Python programs, effectively turning them into neural interpreters (FAIR CodeGen Team…

Machine Learning · Computer Science 2026-03-11 Maximilian Beck , Jonas Gehring , Jannik Kossen , Gabriel Synnaeve

Bugs in Large Language Models Generated Code: An Empirical Study

Large Language Models (LLMs) for code have gained significant attention recently. They can generate code in different programming languages based on provided prompts, fulfilling a long-lasting dream in Software Engineering (SE), i.e.,…

Software Engineering · Computer Science 2024-03-19 Florian Tambon , Arghavan Moradi Dakhel , Amin Nikanjam , Foutse Khomh , Michel C. Desmarais , Giuliano Antoniol