Related papers: BugsInPy: A Database of Existing Bugs in Python Pr…
Benchmarks are among the main drivers of progress in software engineering research. However, many current benchmarks are limited by inadequate system oracles and sparse unit tests. Our Tests4Py benchmark, derived from the BugsInPy…
In the past decade, research on test-suite-based automatic program repair has grown significantly. Each year, new approaches and implementations are featured in major software engineering venues. However, most of those approaches are…
Datasets such as Defects4J and BugsInPy that contain bugs from real-world software projects are necessary for a realistic evaluation of automated debugging tools. However these datasets largely identify only a single bug in each entry,…
Defects4J has enabled numerous software testing and debugging research work since its introduction. A large part of its contribution, and the resulting popularity, lies in the clear separation and distillation of the root cause of each…
Bug-fix benchmarks are essential for evaluating methodologies in automatic program repair (APR) and fault localization (FL). However, existing benchmarks, exemplified by Defects4J, need to evolve to incorporate recent bug-fixes aligned with…
Defects4J is a large, peer-reviewed, structured dataset of real-world Java bugs. Each bug in Defects4J is provided with a test suite and at least one failing test case that triggers the bug. In this paper, we report on an experiment to…
In the past couple of decades, significant research efforts have been devoted to the prediction of software bugs (i.e., defects). In general, these works leverage a diverse set of metrics, tools, and techniques to predict which classes,…
The context of this work is specification, detection and ultimately removal of detectable harmful patterns in source code that are associated with defects in design and implementation of software. In particular, we investigate five code…
Fixing software faults contributes significantly to the cost of software maintenance and evolution. Techniques for reducing these costs require datasets of software faults, as well as an understanding of the faults, for optimal testing and…
Defects4J is a large, peer-reviewed, structured dataset of real-world Java bugs. Each bug in Defects4J comes with a test suite and at least one failing test case that triggers the bug. In this paper, we report on an experiment to explore…
Despite its massive popularity as a programming language, especially in novel domains like data science programs, there is comparatively little research about fault localization that targets Python. Even though it is plausible that several…
Software defect datasets, which are collections of software bugs, are essential resources to facilitate empirical research and enable standardized benchmarking for a wide range of software engineering techniques, including emerging areas…
Tests that cause spurious failures without any code changes, i.e., flaky tests, hamper regression testing, increase maintenance costs, may shadow real bugs, and decrease trust in tests. While the prevalence and importance of flakiness is…
Computational notebooks, such as Jupyter, have been widely adopted by data scientists to write code for analyzing and visualizing data. Despite their growing adoption and popularity, there has been no thorough study to understand Jupyter…
For software testing research, Defects4J stands out as the primary benchmark dataset, offering a controlled environment to study real bugs from prominent open-source systems. However, prior research indicates that Defects4J might include…
R and Python are among the most popular languages used in many critical data analytics tasks. However, we still do not fully understand the capabilities of these two languages w.r.t. bugs encountered in data analytics tasks. What type of…
The rapid escalation of applying Machine Learning (ML) in various domains has led to paying more attention to the quality of ML components. There is then a growth of techniques and tools aiming at improving the quality of ML components and…
Well-designed and publicly available datasets of bugs are an invaluable asset to advance research fields such as fault localization and program repair as they allow directly and fairly comparison between competing techniques and also the…
As the adoption of Deep Learning (DL) systems continues to rise, an increasing number of approaches are being proposed to test these systems, localise faults within them, and repair those faults. The best attestation of effectiveness for…
While significant progress has been made in automating various aspects of software development through coding agents, there is still significant room for improvement in their bug fixing capabilities. Debugging and investigation of runtime…