Related papers: The GitHub Recent Bugs Dataset for Evaluating LLM-…

Evaluating Diverse Large Language Models for Automatic and General Bug Reproduction

Bug reproduction is a critical developer activity that is also challenging to automate, as bug reports are often in natural language and thus can be difficult to transform to test cases consistently. As a result, existing techniques mostly…

Software Engineering · Computer Science 2023-11-10 Sungmin Kang , Juyeon Yoon , Nargiz Askarbekkyzy , Shin Yoo

DebugBench: Evaluating Debugging Capability of Large Language Models

Large Language Models (LLMs) have demonstrated exceptional coding capability. However, as another critical component of programming proficiency, the debugging capability of LLMs remains relatively unexplored. Previous evaluations of LLMs'…

Software Engineering · Computer Science 2024-06-07 Runchu Tian , Yining Ye , Yujia Qin , Xin Cong , Yankai Lin , Yinxu Pan , Yesai Wu , Haotian Hui , Weichuan Liu , Zhiyuan Liu , Maosong Sun

Test Case Generation from Bug Reports via Large Language Models: A Cognitive Layered Evaluation Framework

Large Language Models (LLMs) are increasingly applied to automated software testing, yet their ability to generalize beyond memorized patterns and reason about natural language bug reports remains unclear. We present a systematic evaluation…

Software Engineering · Computer Science 2025-10-08 Irtaza Sajid Qureshi , Zhen Ming , Jiang

Bugs in Large Language Models Generated Code: An Empirical Study

Large Language Models (LLMs) for code have gained significant attention recently. They can generate code in different programming languages based on provided prompts, fulfilling a long-lasting dream in Software Engineering (SE), i.e.,…

Software Engineering · Computer Science 2024-03-19 Florian Tambon , Arghavan Moradi Dakhel , Amin Nikanjam , Foutse Khomh , Michel C. Desmarais , Giuliano Antoniol

Debugging with Open-Source Large Language Models: An Evaluation

Large language models have shown good potential in supporting software development tasks. This is why more and more developers turn to LLMs (e.g. ChatGPT) to support them in fixing their buggy code. While this can save time and effort, many…

Software Engineering · Computer Science 2024-09-06 Yacine Majdoub , Eya Ben Charrada

Are Large Language Models Memorizing Bug Benchmarks?

Large Language Models (LLMs) have become integral to various software engineering tasks, including code generation, bug detection, and repair. To evaluate model performance in these domains, numerous bug benchmarks containing real-world…

Software Engineering · Computer Science 2025-04-01 Daniel Ramos , Claudia Mamede , Kush Jain , Paulo Canelas , Catarina Gamboa , Claire Le Goues

Large Language Models are Few-shot Testers: Exploring LLM-based General Bug Reproduction

Many automated test generation techniques have been developed to aid developers with writing tests. To facilitate full automation, most existing techniques aim to either increase coverage, or generate exploratory inputs. However, existing…

Software Engineering · Computer Science 2023-07-26 Sungmin Kang , Juyeon Yoon , Shin Yoo

DeepCode AI Fix: Fixing Security Vulnerabilities with Large Language Models

The automated program repair field has attracted substantial interest over the years, but despite significant research efforts, creating a system that works well for complex semantic bugs such as security vulnerabilities has proven…

Cryptography and Security · Computer Science 2024-02-26 Berkay Berabi , Alexey Gronskiy , Veselin Raychev , Gishor Sivanrupan , Victor Chibotaru , Martin Vechev

Can LLMs Demystify Bug Reports?

Bugs are notoriously challenging: they slow down software users and result in time-consuming investigations for developers. These challenges are exacerbated when bugs must be reported in natural language by users. Indeed, we lack reliable…

Software Engineering · Computer Science 2023-10-11 Laura Plein , Tegawendé F. Bissyandé

Integrating Large Language Models in Software Engineering Education: A Pilot Study through GitHub Repositories Mining

Context: Large Language Models (LLMs) such as ChatGPT are increasingly adopted in software engineering (SE) education, offering both opportunities and challenges. Their adoption requires systematic investigation to ensure responsible…

Software Engineering · Computer Science 2025-09-08 Maryam Khan , Muhammad Azeem Akbar , Jussi Kasurinen

An Empirical Study on the Capability of LLMs in Decomposing Bug Reports

Background: Bug reports are essential to the software development life cycle. They help developers track and resolve issues, but are often difficult to process due to their complexity, which can delay resolution and affect software quality.…

Software Engineering · Computer Science 2025-04-30 Zhiyuan Chen , Vanessa Nava-Camal , Ahmad Suleiman , Yiming Tang , Daqing Hou , Weiyi Shang

Towards Generating Functionally Correct Code Edits from Natural Language Issue Descriptions

Large language models (LLMs), such as OpenAI's Codex, have demonstrated their potential to generate code from natural language descriptions across a wide range of programming tasks. Several benchmarks have recently emerged to evaluate the…

Software Engineering · Computer Science 2023-04-11 Sarah Fakhoury , Saikat Chakraborty , Madan Musuvathi , Shuvendu K. Lahiri

LLMs are Bug Replicators: An Empirical Study on LLMs' Capability in Completing Bug-prone Code

Large Language Models (LLMs) have demonstrated remarkable performance in code completion. However, the training data used to develop these models often contain a significant amount of buggy code. Yet, it remains unclear to what extent these…

Software Engineering · Computer Science 2025-03-17 Liwei Guo , Sixiang Ye , Zeyu Sun , Xiang Chen , Yuxia Zhang , Bo Wang , Jie M. Zhang , Zheng Li , Yong Liu

Towards Understanding Bugs in Distributed Training and Inference Frameworks for Large Language Models

With the rapid development of large language models (LLMs), distributed training and inference frameworks like DeepSpeed have become essential for scaling model training and inference across multiple GPUs or nodes. However, the increasing…

Software Engineering · Computer Science 2025-06-13 Xiao Yu , Haoxuan Chen , Feifei Niu , Xing Hu , Jacky Wai Keung , Xin Xia

A Critical Review of Large Language Model on Software Engineering: An Example from ChatGPT and Automated Program Repair

Large Language Models (LLMs) have been gaining increasing attention and demonstrated promising performance across a variety of Software Engineering (SE) tasks, such as Automated Program Repair (APR), code summarization, and code completion.…

Software Engineering · Computer Science 2024-04-18 Quanjun Zhang , Tongke Zhang , Juan Zhai , Chunrong Fang , Bowen Yu , Weisong Sun , Zhenyu Chen

Detection of LLM-Generated Java Code Using Discretized Nested Bigrams

Large Language Models (LLMs) are currently used extensively to generate code by professionals and students, motivating the development of tools to detect LLM-generated code for applications such as academic integrity and cybersecurity. We…

Software Engineering · Computer Science 2025-02-25 Timothy Paek , Chilukuri Mohan

LLPut: Investigating Large Language Models for Bug Report-Based Input Generation

Failure-inducing inputs play a crucial role in diagnosing and analyzing software bugs. Bug reports typically contain these inputs, which developers extract to facilitate debugging. Since bug reports are written in natural language, prior…

Software Engineering · Computer Science 2025-12-16 Alif Al Hasan , Subarna Saha , Mia Mohammad Imran , Tarannum Shaila Zaman

GitBug-Java: A Reproducible Benchmark of Recent Java Bugs

Bug-fix benchmarks are essential for evaluating methodologies in automatic program repair (APR) and fault localization (FL). However, existing benchmarks, exemplified by Defects4J, need to evolve to incorporate recent bug-fixes aligned with…

Software Engineering · Computer Science 2024-11-04 André Silva , Nuno Saavedra , Martin Monperrus

What's Wrong with Your Code Generated by Large Language Models? An Extensive Study

The increasing development of LLMs in code generation has drawn significant attention among researchers. To enhance LLM-based code generation ability, current efforts are predominantly directed towards collecting high-quality datasets and…

Software Engineering · Computer Science 2025-10-20 Shihan Dou , Haoxiang Jia , Shenxi Wu , Huiyuan Zheng , Muling Wu , Yunbo Tao , Ming Zhang , Mingxu Chai , Jessica Fan , Zhiheng Xi , Rui Zheng , Yueming Wu , Ming Wen , Tao Gui , Qi Zhang , Xipeng Qiu , Xuanjing Huang

Large Language Model for Vulnerability Detection: Emerging Results and Future Directions

Previous learning-based vulnerability detection methods relied on either medium-sized pre-trained models or smaller neural networks from scratch. Recent advancements in Large Pre-Trained Language Models (LLMs) have showcased remarkable…

Software Engineering · Computer Science 2024-01-30 Xin Zhou , Ting Zhang , David Lo