Related papers: Bears: An Extensible Java Bug Benchmark for Automa…

GitBug-Java: A Reproducible Benchmark of Recent Java Bugs

Bug-fix benchmarks are essential for evaluating methodologies in automatic program repair (APR) and fault localization (FL). However, existing benchmarks, exemplified by Defects4J, need to evolve to incorporate recent bug-fixes aligned with…

Software Engineering · Computer Science 2024-11-04 André Silva , Nuno Saavedra , Martin Monperrus

BugSwarm: Mining and Continuously Growing a Dataset of Reproducible Failures and Fixes

Fault-detection, localization, and repair methods are vital to software quality; but it is difficult to evaluate their generality, applicability, and current effectiveness. Large, diverse, realistic datasets of durably-reproducible faults…

Software Engineering · Computer Science 2019-07-24 David A. Tomassi , Naji Dmeiri , Yichen Wang , Antara Bhowmick , Yen-Chuan Liu , Premkumar Devanbu , Bogdan Vasilescu , Cindy Rubio-González

Empirical Review of Java Program Repair Tools: A Large-Scale Experiment on 2,141 Bugs and 23,551 Repair Attempts

In the past decade, research on test-suite-based automatic program repair has grown significantly. Each year, new approaches and implementations are featured in major software engineering venues. However, most of those approaches are…

Software Engineering · Computer Science 2019-05-29 Thomas Durieux , Fernanda Madeiral , Matias Martinez , Rui Abreu

GitBug-Actions: Building Reproducible Bug-Fix Benchmarks with GitHub Actions

Bug-fix benchmarks are fundamental in advancing various sub-fields of software engineering such as automatic program repair (APR) and fault localization (FL). A good benchmark must include recent examples that accurately reflect…

Software Engineering · Computer Science 2024-03-15 Nuno Saavedra , André Silva , Martin Monperrus

CI-Repair-Bench: A Repository-Aware Benchmark for Automated Patch Validation via CI Workflows

Continuous Integration (CI) enforces repository-level correctness through multi-stage workflows and is central to modern software development, yet diagnosing and repairing CI failures remains challenging. Unlike traditional program repair,…

Software Engineering · Computer Science 2026-05-06 Rabeya Khatun Muna , Md Nakhla Rafi , Tse-Hsun , Chen

HotBugs.jar: A Benchmark of Hot Fixes for Time-Critical Bugs

Hot fixes are urgent, unplanned changes deployed to production systems to address time-critical issues. Despite their importance, no existing evaluation benchmark focuses specifically on hot fixes. We present HotBugs$.$jar, the first…

Software Engineering · Computer Science 2025-10-10 Carol Hanna , Federica Sarro , Mark Harman , Justyna Petke

Automatic Build Repair for Test Cases using Incompatible Java Versions

Context: Bug bisection is a common technique used to identify a revision that introduces a bug or indirectly fixes a bug, and often involves executing multiple revisions of a project to determine whether the bug is present within the…

Software Engineering · Computer Science 2024-05-06 Ching Hang Mak , Shing-Chi Cheung

GitBugs: Bug Reports for Duplicate Detection, Retrieval Augmented Generation, Triage, and More

Bug reports provide critical insights into software quality, yet existing datasets often suffer from limited scope, outdated content, or insufficient metadata for machine learning. To address these limitations, we present GitBugs-a…

Software Engineering · Computer Science 2026-04-30 Avinash Patil , Siru Tao , Aryan Jadon

Critical Review of BugSwarm for Fault Localization and Program Repair

Benchmarks play an important role in evaluating the efficiency and effectiveness of solutions to automate several phases of the software development lifecycle. Moreover, if well designed, they also serve us well as an important artifact to…

Software Engineering · Computer Science 2019-05-24 Thomas Durieux , Rui Abreu

Automatic Repair of Real Bugs: An Experience Report on the Defects4J Dataset

Defects4J is a large, peer-reviewed, structured dataset of real-world Java bugs. Each bug in Defects4J is provided with a test suite and at least one failing test case that triggers the bug. In this paper, we report on an experiment to…

Software Engineering · Computer Science 2015-12-24 Matias Martinez , Thomas Durieux , Jifeng Xuan , Romain Sommerard , Martin Monperrus

On the Efficiency of Test Suite based Program Repair: A Systematic Assessment of 16 Automated Repair Systems for Java Programs

Test-based automated program repair has been a prolific field of research in software engineering in the last decade. Many approaches have indeed been proposed, which leverage test suites as a weak, but affordable, approximation to program…

Software Engineering · Computer Science 2020-08-04 Kui Liu , Shangwen Wang , Anil Koyuncu , Kisub Kim , Tegawendé F. Bissyandé , Dongsun Kim , Peng Wu , Jacques Klein , Xiaoguang Mao , Yves Le Traon

Automatic Repair of Real Bugs in Java: A Large-Scale Experiment on the Defects4J Dataset

Defects4J is a large, peer-reviewed, structured dataset of real-world Java bugs. Each bug in Defects4J comes with a test suite and at least one failing test case that triggers the bug. In this paper, we report on an experiment to explore…

Software Engineering · Computer Science 2021-11-12 Matias Martinez , Thomas Durieux , Romain Sommerard , Jifeng Xuan , Martin Monperrus

Generating Bug-Fixes Using Pretrained Transformers

Detecting and fixing bugs are two of the most important yet frustrating parts of the software development cycle. Existing bug detection tools are based mainly on static analyzers, which rely on mathematical logic and symbolic reasoning…

Computation and Language · Computer Science 2021-10-04 Dawn Drain , Chen Wu , Alexey Svyatkovskiy , Neel Sundaresan

Extracting Concise Bug-Fixing Patches from Human-Written Patches in Version Control Systems

High-quality and large-scale repositories of real bugs and their concise patches collected from real-world applications are critical for research in software engineering community. In such a repository, each real bug is explicitly…

Software Engineering · Computer Science 2021-03-02 Yanjie Jiang , Hui Liu , Nan Niu , Lu Zhang , Yamin Hu

Identifying Bugs in Make and JVM-Oriented Builds

Incremental and parallel builds are crucial features of modern build systems. Parallelism enables fast builds by running independent tasks simultaneously, while incrementality saves time and computing resources by processing the build…

Software Engineering · Computer Science 2023-12-05 Thodoris Sotiropoulos , Stefanos Chaliasos , Dimitris Mitropoulos , Diomidis Spinellis

Reproducible Automated Program Repair Is Hard -- Experiences With the Defects4J Dataset

In the research of automated program repair (APR), benchmark datasets consisting of known defects in combination with test suites that indicate the defects are of high importance. They allow for an evidence-based comparison of different APR…

Software Engineering · Computer Science 2026-04-30 Adam Krafczyk , Klaus Schmid

An Empirical Study on Bug Severity Estimation using Source Code Metrics and Static Analysis

In the past couple of decades, significant research efforts have been devoted to the prediction of software bugs (i.e., defects). In general, these works leverage a diverse set of metrics, tools, and techniques to predict which classes,…

Software Engineering · Computer Science 2024-08-06 Ehsan Mashhadi , Shaiful Chowdhury , Somayeh Modaberi , Hadi Hemmati , Gias Uddin

The Java Build Framework: Large Scale Compilation

Large repositories of source code for research tend to limit their utility to static analysis of the code, as they give no guarantees on whether the projects are compilable, much less runnable in any way. The immediate consequence of the…

Software Engineering · Computer Science 2018-04-13 Pedro Martins , Rohan Achar , Cristina V. Lopes

Categorizing Bugs with Social Networks: A Case Study on Four Open Source Software Communities

Efficient bug triaging procedures are an important precondition for successful collaborative software engineering projects. Triaging bugs can become a laborious task particularly in open source software (OSS) projects with a large base of…

Software Engineering · Computer Science 2013-03-04 Marcelo Serrano Zanetti , Ingo Scholtes , Claudio Juan Tessone , Frank Schweitzer

A Comprehensive Study of Automatic Program Repair on the QuixBugs Benchmark

Automatic program repair papers tend to repeatedly use the same benchmarks. This poses a threat to the external validity of the findings of the program repair research community. In this paper, we perform an empirical study of automatic…

Software Engineering · Computer Science 2020-09-29 He Ye , Matias Martinez , Thomas Durieux , Martin Monperrus