What's Wrong with Your Code Generated by Large Language Models? An Extensive Study
Abstract
The increasing development of LLMs in code generation has drawn significant attention among researchers. To enhance LLM-based code generation ability, current efforts are predominantly directed towards collecting high-quality datasets and leveraging diverse training technologies. However, there is a notable lack of comprehensive studies examining the limitations and boundaries of existing methods. To bridge this gap, we conducted an extensive empirical study evaluating the performance of three leading closed-source LLMs and six popular open-source LLMs on three commonly used benchmarks. Our investigation, which evaluated the length, cyclomatic complexity and API number of the generated code, revealed that these LLMs face challenges in generating successful code for more complex problems, and tend to produce code that is shorter yet more complicated as compared to canonical solutions. Additionally, we developed a taxonomy of bugs for incorrect codes that includes three categories and ten sub-categories, and analyzed the root cause for common bug types. To better understand the performance of LLMs in real-world projects, we also manually created a real-world benchmark RWPB. We analyzed bugs on RWPB to highlight distinct differences in bug distributions between actual scenarios and existing benchmarks. Finally, we propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code based on bug types and compiler feedback. Our comprehensive and extensive study provides insights into the current limitations of LLM-based code generation and opportunities for enhancing the accuracy and quality of the generated code.
Cite
@article{arxiv.2407.06153,
title = {What's Wrong with Your Code Generated by Large Language Models? An Extensive Study},
author = {Shihan Dou and Haoxiang Jia and Shenxi Wu and Huiyuan Zheng and Muling Wu and Yunbo Tao and Ming Zhang and Mingxu Chai and Jessica Fan and Zhiheng Xi and Rui Zheng and Yueming Wu and Ming Wen and Tao Gui and Qi Zhang and Xipeng Qiu and Xuanjing Huang},
journal= {arXiv preprint arXiv:2407.06153},
year = {2025}
}
Comments
Accepted by SCIENCE CHINA Information Sciences (SCIS)