Related papers: PreciseBugCollector: Extensible, Executable and Pr…

Extracting Concise Bug-Fixing Patches from Human-Written Patches in Version Control Systems

High-quality and large-scale repositories of real bugs and their concise patches collected from real-world applications are critical for research in software engineering community. In such a repository, each real bug is explicitly…

Software Engineering · Computer Science 2021-03-02 Yanjie Jiang , Hui Liu , Nan Niu , Lu Zhang , Yamin Hu

EvilCoder: Automated Bug Insertion

The art of finding software vulnerabilities has been covered extensively in the literature and there is a huge body of work on this topic. In contrast, the intentional insertion of exploitable, security-critical bugs has received little…

Cryptography and Security · Computer Science 2020-07-07 Jannik Pewny , Thorsten Holz

BugSwarm: Mining and Continuously Growing a Dataset of Reproducible Failures and Fixes

Fault-detection, localization, and repair methods are vital to software quality; but it is difficult to evaluate their generality, applicability, and current effectiveness. Large, diverse, realistic datasets of durably-reproducible faults…

Software Engineering · Computer Science 2019-07-24 David A. Tomassi , Naji Dmeiri , Yichen Wang , Antara Bhowmick , Yen-Chuan Liu , Premkumar Devanbu , Bogdan Vasilescu , Cindy Rubio-González

BuGL -- A Cross-Language Dataset for Bug Localization

Bug Localization is the process of locating potential error-prone files or methods from a given bug report and source code. There is extensive research on bug localization in the literature that focuses on applying information retrieval…

Software Engineering · Computer Science 2020-04-21 Sandeep Muvva , A Eashaan Rao , Sridhar Chimalakonda

DeepBugs: A Learning Approach to Name-based Bug Detection

Natural language elements in source code, e.g., the names of variables and functions, convey useful information. However, most existing bug detection tools ignore this information and therefore miss some classes of bugs. The few existing…

Software Engineering · Computer Science 2018-05-31 Michael Pradel , Koushik Sen

Categorizing Bugs with Social Networks: A Case Study on Four Open Source Software Communities

Efficient bug triaging procedures are an important precondition for successful collaborative software engineering projects. Triaging bugs can become a laborious task particularly in open source software (OSS) projects with a large base of…

Software Engineering · Computer Science 2013-03-04 Marcelo Serrano Zanetti , Ingo Scholtes , Claudio Juan Tessone , Frank Schweitzer

SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair

This paper presents a novel end-to-end approach to program repair based on sequence-to-sequence learning. We devise, implement, and evaluate a system, called SequenceR, for fixing bugs based on sequence-to-sequence learning on source code.…

Software Engineering · Computer Science 2019-09-12 Zimin Chen , Steve Kommrusch , Michele Tufano , Louis-Noël Pouchet , Denys Poshyvanyk , Martin Monperrus

An Empirical Study on Learning Bug-Fixing Patches in the Wild via Neural Machine Translation

Millions of open-source projects with numerous bug fixes are available in code repositories. This proliferation of software development histories can be leveraged to learn how to fix common programming bugs. To explore such a potential, we…

Software Engineering · Computer Science 2019-05-22 Michele Tufano , Cody Watson , Gabriele Bavota , Massimiliano Di Penta , Martin White , Denys Poshyvanyk

Generating Bug-Fixes Using Pretrained Transformers

Detecting and fixing bugs are two of the most important yet frustrating parts of the software development cycle. Existing bug detection tools are based mainly on static analyzers, which rely on mathematical logic and symbolic reasoning…

Computation and Language · Computer Science 2021-10-04 Dawn Drain , Chen Wu , Alexey Svyatkovskiy , Neel Sundaresan

Mining Bug Repositories for Multi-Fault Programs

Datasets such as Defects4J and BugsInPy that contain bugs from real-world software projects are necessary for a realistic evaluation of automated debugging tools. However these datasets largely identify only a single bug in each entry,…

Software Engineering · Computer Science 2024-04-11 Dylan Callaghan , Bernd Fischer

BLAZE: Cross-Language and Cross-Project Bug Localization via Dynamic Chunking and Hard Example Learning

Software bugs require developers to exert significant effort to identify and resolve them, often consuming about one-third of their time. Bug localization, the process of pinpointing the exact source code files that need modification, is…

Software Engineering · Computer Science 2025-06-24 Partha Chakraborty , Mahmoud Alfadel , Meiyappan Nagappan

REEF: A Framework for Collecting Real-World Vulnerabilities and Fixes

Software plays a crucial role in our daily lives, and therefore the quality and security of software systems have become increasingly important. However, vulnerabilities in software still pose a significant threat, as they can have serious…

Software Engineering · Computer Science 2023-09-18 Chaozheng Wang , Zongjie Li , Yun Peng , Shuzheng Gao , Sirong Chen , Shuai Wang , Cuiyun Gao , Michael R. Lyu

BugGen: A Self-Correcting Multi-Agent LLM Pipeline for Realistic RTL Bug Synthesis

Hardware complexity continues to strain verification resources, motivating the adoption of machine learning (ML) methods to improve debug efficiency. However, ML-assisted debugging critically depends on diverse and scalable bug datasets,…

Software Engineering · Computer Science 2025-06-19 Surya Jasper , Minh Luu , Evan Pan , Aakash Tyagi , Michael Quinn , Jiang Hu , David Kebo Houngninou

An Approach to Detecting Bugs in Pattern-Based Bug Detectors

Static bug finders have been widely-adopted by developers to find bugs in real world software projects. They leverage predefined heuristic static analysis rules to scan source code or binary code of a software project, and report violations…

Software Engineering · Computer Science 2021-12-24 Junjie Wang , Yuchao Huang , Song Wang , Qing Wang

GitBugs: Bug Reports for Duplicate Detection, Retrieval Augmented Generation, Triage, and More

Bug reports provide critical insights into software quality, yet existing datasets often suffer from limited scope, outdated content, or insufficient metadata for machine learning. To address these limitations, we present GitBugs-a…

Software Engineering · Computer Science 2026-04-30 Avinash Patil , Siru Tao , Aryan Jadon

RegMiner: Towards Constructing a Large Regression Dataset from Code Evolution History

Bug datasets consisting of real-world bugs are important artifacts for researchers and programmers, which lay empirical and experimental foundation for various SE/PL research such as fault localization, software testing, and program repair.…

Software Engineering · Computer Science 2022-07-05 Xuezhi Song , Yun Lin , Siang Hwee Ng , Yijian Wu , Xin Peng , Jin Song Dong , Hong Mei

A Systematic Impact Study for Fuzzer-Found Compiler Bugs

Despite much recent interest in compiler randomized testing (fuzzing), the practical impact of fuzzer-found compiler bugs on real-world applications has barely been assessed. We present the first quantitative and qualitative study of the…

Software Engineering · Computer Science 2019-09-06 Michaël Marcozzi , Qiyi Tang , Alastair F. Donaldson , Cristian Cadar

Precise Debugging Benchmark: Is Your Model Debugging or Regenerating?

Unlike code completion, debugging requires localizing faults and applying targeted edits. We observe that frontier LLMs often regenerate correct but over-edited solutions during debugging. To evaluate how far LLMs are from precise…

Software Engineering · Computer Science 2026-05-19 Wang Bill Zhu , Miaosen Chai , Shangshang Wang , Yejia Liu , Song Bian , Honghua Dong , Willie Neiswanger , Robin Jia

DeepDiagnosis: Automatically Diagnosing Faults and Recommending Actionable Fixes in Deep Learning Programs

Deep Neural Networks (DNNs) are used in a wide variety of applications. However, as in any software application, DNN-based apps are afflicted with bugs. Previous work observed that DNN bug fix patterns are different from traditional bug fix…

Software Engineering · Computer Science 2021-12-09 Mohammad Wardat , Breno Dantas Cruz , Wei Le , Hridesh Rajan

PerfCurator: Curating a large-scale dataset of performance bug-related commits from public repositories

Performance bugs challenge software development, degrading performance and wasting computational resources. Software developers invest substantial effort in addressing these issues. Curating these performance bugs can offer valuable…

Software Engineering · Computer Science 2024-06-18 Md Abul Kalam Azad , Manoj Alexender , Matthew Alexender , Syed Salauddin Mohammad Tariq , Foyzul Hassan , Probir Roy