Related papers: BugsInPy: A Database of Existing Bugs in Python Pr…

Tests4Py: A Benchmark for System Testing

Benchmarks are among the main drivers of progress in software engineering research. However, many current benchmarks are limited by inadequate system oracles and sparse unit tests. Our Tests4Py benchmark, derived from the BugsInPy…

Software Engineering · Computer Science 2024-05-15 Marius Smytzek , Martin Eberlein , Batuhan Serce , Lars Grunske , Andreas Zeller

Empirical Review of Java Program Repair Tools: A Large-Scale Experiment on 2,141 Bugs and 23,551 Repair Attempts

In the past decade, research on test-suite-based automatic program repair has grown significantly. Each year, new approaches and implementations are featured in major software engineering venues. However, most of those approaches are…

Software Engineering · Computer Science 2019-05-29 Thomas Durieux , Fernanda Madeiral , Matias Martinez , Rui Abreu

Mining Bug Repositories for Multi-Fault Programs

Datasets such as Defects4J and BugsInPy that contain bugs from real-world software projects are necessary for a realistic evaluation of automated debugging tools. However these datasets largely identify only a single bug in each entry,…

Software Engineering · Computer Science 2024-04-11 Dylan Callaghan , Bernd Fischer

Searching for Multi-Fault Programs in Defects4J

Defects4J has enabled numerous software testing and debugging research work since its introduction. A large part of its contribution, and the resulting popularity, lies in the clear separation and distillation of the root cause of each…

Software Engineering · Computer Science 2021-08-11 Gabin An , Juyeon Yoon , Shin Yoo

GitBug-Java: A Reproducible Benchmark of Recent Java Bugs

Bug-fix benchmarks are essential for evaluating methodologies in automatic program repair (APR) and fault localization (FL). However, existing benchmarks, exemplified by Defects4J, need to evolve to incorporate recent bug-fixes aligned with…

Software Engineering · Computer Science 2024-11-04 André Silva , Nuno Saavedra , Martin Monperrus

Automatic Repair of Real Bugs: An Experience Report on the Defects4J Dataset

Defects4J is a large, peer-reviewed, structured dataset of real-world Java bugs. Each bug in Defects4J is provided with a test suite and at least one failing test case that triggers the bug. In this paper, we report on an experiment to…

Software Engineering · Computer Science 2015-12-24 Matias Martinez , Thomas Durieux , Jifeng Xuan , Romain Sommerard , Martin Monperrus

An Empirical Study on Bug Severity Estimation using Source Code Metrics and Static Analysis

In the past couple of decades, significant research efforts have been devoted to the prediction of software bugs (i.e., defects). In general, these works leverage a diverse set of metrics, tools, and techniques to predict which classes,…

Software Engineering · Computer Science 2024-08-06 Ehsan Mashhadi , Shaiful Chowdhury , Somayeh Modaberi , Hadi Hemmati , Gias Uddin

Does Python Smell Like Java? Tool Support for Design Defect Discovery in Python

The context of this work is specification, detection and ultimately removal of detectable harmful patterns in source code that are associated with defects in design and implementation of software. In particular, we investigate five code…

Software Engineering · Computer Science 2017-04-03 Nicole Vavrová , Vadim Zaytsev

Empirical Analysis of Temporal and Spatial Fault Characteristics in Multi-Fault Bug Repositories

Fixing software faults contributes significantly to the cost of software maintenance and evolution. Techniques for reducing these costs require datasets of software faults, as well as an understanding of the faults, for optimal testing and…

Software Engineering · Computer Science 2025-08-13 Dylan Callaghan , Alexandra van der Spuy , Bernd Fischer

Automatic Repair of Real Bugs in Java: A Large-Scale Experiment on the Defects4J Dataset

Defects4J is a large, peer-reviewed, structured dataset of real-world Java bugs. Each bug in Defects4J comes with a test suite and at least one failing test case that triggers the bug. In this paper, we report on an experiment to explore…

Software Engineering · Computer Science 2021-11-12 Matias Martinez , Thomas Durieux , Romain Sommerard , Jifeng Xuan , Martin Monperrus

An Empirical Study of Fault Localization in Python Programs

Despite its massive popularity as a programming language, especially in novel domains like data science programs, there is comparatively little research about fault localization that targets Python. Even though it is plausible that several…

Software Engineering · Computer Science 2024-10-03 Mohammad Rezaalipour , Carlo A. Furia

From Bugs to Benchmarks: A Comprehensive Survey of Software Defect Datasets

Software defect datasets, which are collections of software bugs, are essential resources to facilitate empirical research and enable standardized benchmarking for a wide range of software engineering techniques, including emerging areas…

Software Engineering · Computer Science 2026-02-12 Hao-Nan Zhu , Robert M. Furth , Michael Pradel , Cindy Rubio-González

An Empirical Study of Flaky Tests in Python

Tests that cause spurious failures without any code changes, i.e., flaky tests, hamper regression testing, increase maintenance costs, may shadow real bugs, and decrease trust in tests. While the prevalence and importance of flakiness is…

Software Engineering · Computer Science 2022-02-15 Martin Gruber , Stephan Lukasczyk , Florian Kroiß , Gordon Fraser

Bug Analysis in Jupyter Notebook Projects: An Empirical Study

Computational notebooks, such as Jupyter, have been widely adopted by data scientists to write code for analyzing and visualizing data. Despite their growing adoption and popularity, there has been no thorough study to understand Jupyter…

Software Engineering · Computer Science 2022-10-14 Taijara Loiola de Santana , Paulo Anselmo da Mota Silveira Neto , Eduardo Santana de Almeida , Iftekhar Ahmed

Back to the Future! Studying Data Cleanness in Defects4J and its Impact on Fault Localization

For software testing research, Defects4J stands out as the primary benchmark dataset, offering a controlled environment to study real bugs from prominent open-source systems. However, prior research indicates that Defects4J might include…

Software Engineering · Computer Science 2024-08-09 Md Nakhla Rafi , An Ran Chen , Tse-Hsun Chen , Shaohua Wang

Characterizing Bugs in Python and R Data Analytics Programs

R and Python are among the most popular languages used in many critical data analytics tasks. However, we still do not fully understand the capabilities of these two languages w.r.t. bugs encountered in data analytics tasks. What type of…

Software Engineering · Computer Science 2023-06-16 Shibbir Ahmed , Mohammad Wardat , Hamid Bagheri , Breno Dantas Cruz , Hridesh Rajan

Bugs in Machine Learning-based Systems: A Faultload Benchmark

The rapid escalation of applying Machine Learning (ML) in various domains has led to paying more attention to the quality of ML components. There is then a growth of techniques and tools aiming at improving the quality of ML components and…

Software Engineering · Computer Science 2023-01-18 Mohammad Mehdi Morovati , Amin Nikanjam , Foutse Khomh , Zhen Ming , Jiang

Dissection of a Bug Dataset: Anatomy of 395 Patches from Defects4J

Well-designed and publicly available datasets of bugs are an invaluable asset to advance research fields such as fault localization and program repair as they allow directly and fairly comparison between competing techniques and also the…

Software Engineering · Computer Science 2021-11-12 Victor Sobreira , Thomas Durieux , Fernanda Madeiral , Martin Monperrus , Marcelo A. Maia

Real Faults in Deep Learning Fault Benchmarks: How Real Are They?

As the adoption of Deep Learning (DL) systems continues to rise, an increasing number of approaches are being proposed to test these systems, localise faults within them, and repair those faults. The best attestation of effectiveness for…

Software Engineering · Computer Science 2024-12-24 Gunel Jahangirova , Nargiz Humbatova , Jinhan Kim , Shin Yoo , Paolo Tonella

Debug2Fix: Can Interactive Debugging Help Coding Agents Fix More Bugs?

While significant progress has been made in automating various aspects of software development through coding agents, there is still significant room for improvement in their bug fixing capabilities. Debugging and investigation of runtime…

Software Engineering · Computer Science 2026-04-22 Spandan Garg , Yufan Huang