Related papers: Bears: An Extensible Java Bug Benchmark for Automa…
Bug-fix benchmarks are essential for evaluating methodologies in automatic program repair (APR) and fault localization (FL). However, existing benchmarks, exemplified by Defects4J, need to evolve to incorporate recent bug-fixes aligned with…
Fault-detection, localization, and repair methods are vital to software quality; but it is difficult to evaluate their generality, applicability, and current effectiveness. Large, diverse, realistic datasets of durably-reproducible faults…
In the past decade, research on test-suite-based automatic program repair has grown significantly. Each year, new approaches and implementations are featured in major software engineering venues. However, most of those approaches are…
Bug-fix benchmarks are fundamental in advancing various sub-fields of software engineering such as automatic program repair (APR) and fault localization (FL). A good benchmark must include recent examples that accurately reflect…
Continuous Integration (CI) enforces repository-level correctness through multi-stage workflows and is central to modern software development, yet diagnosing and repairing CI failures remains challenging. Unlike traditional program repair,…
Hot fixes are urgent, unplanned changes deployed to production systems to address time-critical issues. Despite their importance, no existing evaluation benchmark focuses specifically on hot fixes. We present HotBugs$.$jar, the first…
Context: Bug bisection is a common technique used to identify a revision that introduces a bug or indirectly fixes a bug, and often involves executing multiple revisions of a project to determine whether the bug is present within the…
Bug reports provide critical insights into software quality, yet existing datasets often suffer from limited scope, outdated content, or insufficient metadata for machine learning. To address these limitations, we present GitBugs-a…
Benchmarks play an important role in evaluating the efficiency and effectiveness of solutions to automate several phases of the software development lifecycle. Moreover, if well designed, they also serve us well as an important artifact to…
Defects4J is a large, peer-reviewed, structured dataset of real-world Java bugs. Each bug in Defects4J is provided with a test suite and at least one failing test case that triggers the bug. In this paper, we report on an experiment to…
Test-based automated program repair has been a prolific field of research in software engineering in the last decade. Many approaches have indeed been proposed, which leverage test suites as a weak, but affordable, approximation to program…
Defects4J is a large, peer-reviewed, structured dataset of real-world Java bugs. Each bug in Defects4J comes with a test suite and at least one failing test case that triggers the bug. In this paper, we report on an experiment to explore…
Detecting and fixing bugs are two of the most important yet frustrating parts of the software development cycle. Existing bug detection tools are based mainly on static analyzers, which rely on mathematical logic and symbolic reasoning…
High-quality and large-scale repositories of real bugs and their concise patches collected from real-world applications are critical for research in software engineering community. In such a repository, each real bug is explicitly…
Incremental and parallel builds are crucial features of modern build systems. Parallelism enables fast builds by running independent tasks simultaneously, while incrementality saves time and computing resources by processing the build…
In the research of automated program repair (APR), benchmark datasets consisting of known defects in combination with test suites that indicate the defects are of high importance. They allow for an evidence-based comparison of different APR…
In the past couple of decades, significant research efforts have been devoted to the prediction of software bugs (i.e., defects). In general, these works leverage a diverse set of metrics, tools, and techniques to predict which classes,…
Large repositories of source code for research tend to limit their utility to static analysis of the code, as they give no guarantees on whether the projects are compilable, much less runnable in any way. The immediate consequence of the…
Efficient bug triaging procedures are an important precondition for successful collaborative software engineering projects. Triaging bugs can become a laborious task particularly in open source software (OSS) projects with a large base of…
Automatic program repair papers tend to repeatedly use the same benchmarks. This poses a threat to the external validity of the findings of the program repair research community. In this paper, we perform an empirical study of automatic…