English
Related papers

Related papers: Precise Debugging Benchmark: Is Your Model Debuggi…

200 papers

Large language models (LLMs) are leading significant progress in code generation. Beyond one-pass code generation, recent works further integrate unit tests and program verifiers into LLMs to iteratively refine the generated programs.…

Software Engineering · Computer Science 2024-06-12 Li Zhong , Zilong Wang , Jingbo Shang

As the adoption of Deep Learning (DL) systems continues to rise, an increasing number of approaches are being proposed to test these systems, localise faults within them, and repair those faults. The best attestation of effectiveness for…

Software Engineering · Computer Science 2024-12-24 Gunel Jahangirova , Nargiz Humbatova , Jinhan Kim , Shin Yoo , Paolo Tonella

LLMs are transforming software development, yet current code generation and code repair benchmarks mainly assess syntactic and functional correctness in simple, single-error cases. LLMs' capabilities to autonomously find and fix runtime…

Computation and Language · Computer Science 2025-09-17 Zhiyu Yang , Shuo Wang , Yukun Yan , Yang Deng

Debugging CUDA programs has long been challenging because failures often arise from subtle interactions among hardware behavior, compiler decisions, memory hierarchy, and asynchronous execution. More importantly, with the rapid expansion of…

Machine Learning · Computer Science 2026-05-27 Shiyang Li , Haoyang Chen , Mattia Fazzini , Caiwen Ding

Debugging consumes a substantial portion of the software development lifecycle, yet the effectiveness of Large Language Models(LLMs) in this task is not well understood. Competitive programming offers a rich benchmark for such evaluation,…

Software Engineering · Computer Science 2026-03-23 Nabiha Parvez , Tanvin Sarkar Pallab , Mia Mohammad Imran , Tarannum Shaila Zaman

Large Language Models (LLMs) have demonstrated exceptional coding capability. However, as another critical component of programming proficiency, the debugging capability of LLMs remains relatively unexplored. Previous evaluations of LLMs'…

Software Engineering · Computer Science 2024-06-07 Runchu Tian , Yining Ye , Yujia Qin , Xin Cong , Yankai Lin , Yinxu Pan , Yesai Wu , Haotian Hui , Weichuan Liu , Zhiyuan Liu , Maosong Sun

Large language models (LLMs) have achieved impressive performance on code generation. However, for complex programming tasks, generating the correct solution in one go becomes challenging, thus some prior works have designed program repair…

Computation and Language · Computer Science 2023-10-06 Xinyun Chen , Maxwell Lin , Nathanael Schärli , Denny Zhou

Large language models (LLMs) have shown significant advancements in code generation, but still face challenges on tasks beyond their basic capabilities. Recently, the notion of self-debugging has been proposed to boost the performance of…

Software Engineering · Computer Science 2025-01-23 Xiancai Chen , Zhengwei Tao , Kechi Zhang , Changzhi Zhou , Wanli Gu , Yuanpeng He , Mengdi Zhang , Xunliang Cai , Haiyan Zhao , Zhi Jin

The rapid escalation of applying Machine Learning (ML) in various domains has led to paying more attention to the quality of ML components. There is then a growth of techniques and tools aiming at improving the quality of ML components and…

Software Engineering · Computer Science 2023-01-18 Mohammad Mehdi Morovati , Amin Nikanjam , Foutse Khomh , Zhen Ming , Jiang

Large language models (LLMs) can often generate functionally correct code, but their ability to produce efficient implementations for performance-critical systems tasks remains limited. Existing code benchmarks mainly emphasize correctness…

Software Engineering · Computer Science 2026-05-18 Huihao Jing , Wenbin Hu , Haochen Shi , Hanyu Yang , Sirui Zhang , Shaojin Chen , Haoran Li , Yangqiu Song

Large language models (LLMs) are trained through multi-stage pipelines over heterogeneous data sources, yet developers lack a principled way to pinpoint the specific data responsible for an observed behavior. This lack of observability…

Computation and Language · Computer Science 2026-03-19 Wenjie Jacky Mo , Qin Liu , Xiaofei Wen , Wenxuan Zhou , Zhe Zhao , Muhao Chen

DevBench is a telemetry-driven benchmark designed to evaluate Large Language Models (LLMs) on realistic code completion tasks. It includes 1,800 evaluation instances across six programming languages and six task categories derived from real…

Machine Learning · Computer Science 2026-05-19 Adarsh Kumarappan , Pareesa Ameneh Golnari , Wen Wen , Xiaoyu Liu , Gabriel Ryan , Yuting Sun , Shengyu Fu , Elsie Nallipogu

Large Language Models (LLMs) have become integral to various software engineering tasks, including code generation, bug detection, and repair. To evaluate model performance in these domains, numerous bug benchmarks containing real-world…

Software Engineering · Computer Science 2025-04-01 Daniel Ramos , Claudia Mamede , Kush Jain , Paulo Canelas , Catarina Gamboa , Claire Le Goues

Large Language Models (LLMs) have demonstrated remarkable performance in code completion. However, the training data used to develop these models often contain a significant amount of buggy code. Yet, it remains unclear to what extent these…

Software Engineering · Computer Science 2025-03-17 Liwei Guo , Sixiang Ye , Zeyu Sun , Xiang Chen , Yuxia Zhang , Bo Wang , Jie M. Zhang , Zheng Li , Yong Liu

Unit tests (UTs) play an instrumental role in assessing code correctness as well as providing feedback to large language models (LLMs), motivating automated test generation. However, we uncover a trade-off between generating unit test…

Software Engineering · Computer Science 2025-08-22 Archiki Prasad , Elias Stengel-Eskin , Justin Chih-Yao Chen , Zaid Khan , Mohit Bansal

Among areas of software engineering where AI techniques -- particularly, Large Language Models -- seem poised to yield dramatic improvements, an attractive candidate is Automatic Program Repair (APR), the production of satisfactory…

Software Engineering · Computer Science 2025-08-05 Li Huang , Ilgiz Mustafin , Marco Piccioni , Alessandro Schena , Reto Weber , Bertrand Meyer

Large Language Models (LLMs) show promise in code generation tasks. However, their code-writing abilities are often limited in scope: while they can successfully implement simple functions, they struggle with more complex tasks. A…

Software Engineering · Computer Science 2024-07-30 Jialin Song , Jonathan Raiman , Bryan Catanzaro

In the domain of code generation, self-debugging is crucial. It allows LLMs to refine their generated code based on execution feedback. This is particularly important because generating correct solutions in one attempt proves challenging…

Computation and Language · Computer Science 2025-02-17 Nan Jiang , Xiaopeng Li , Shiqi Wang , Qiang Zhou , Soneya Binta Hossain , Baishakhi Ray , Varun Kumar , Xiaofei Ma , Anoop Deoras

While large language models have made significant strides in code generation, the pass rate of the generated code is bottlenecked on subtle errors, often requiring human intervention to pass tests, especially for complex problems. Existing…

Computation and Language · Computer Science 2025-11-25 Yuling Shi , Songsong Wang , Chengcheng Wan , Min Wang , Xiaodong Gu

Large Language Models (LLMs) have training corpora containing large amounts of program code, greatly improving the model's code comprehension and generation capabilities. However, sound comprehensive research on detecting program…

Cryptography and Security · Computer Science 2024-08-22 Yu Liu , Lang Gao , Mingxin Yang , Yu Xie , Ping Chen , Xiaojin Zhang , Wei Chen
‹ Prev 1 2 3 10 Next ›