Related papers: HaPy-Bug -- Human Annotated Python Bug Resolution …

An Annotated Dataset of Stack Overflow Post Edits

To improve software engineering, software repositories have been mined for code snippets and bug fixes. Typically, this mining takes place at the level of files or commits. To be able to dig deeper and to extract insights at a higher…

Software Engineering · Computer Science 2020-05-07 Sebastian Baltes , Markus Wagner

BugsRepo: A Comprehensive Curated Dataset of Bug Reports, Comments and Contributors Information from Bugzilla

Bug reports help software development teams enhance software quality, yet their utility is often compromised by unclear or incomplete information. This issue not only hinders developers' ability to quickly understand and resolve bugs but…

Software Engineering · Computer Science 2025-04-29 Jagrit Acharya , Gouri Ginde

Hints Help Finding and Fixing Bugs Differently in Python and Text-based Program Representations

With the recent advances in AI programming assistants such as GitHub Copilot, programming is not limited to classical programming languages anymore--programming tasks can also be expressed and solved by end-users in natural text. Despite…

Software Engineering · Computer Science 2024-12-18 Ruchit Rawal , Victor-Alexandru Pădurean , Sven Apel , Adish Singla , Mariya Toneva

PyResBugs: A Dataset of Residual Python Bugs for Natural Language-Driven Fault Injection

This paper presents PyResBugs, a curated dataset of residual bugs, i.e., defects that persist undetected during traditional testing but later surface in production, collected from major Python frameworks. Each bug in the dataset is paired…

Software Engineering · Computer Science 2025-05-12 Domenico Cotroneo , Giuseppe De Rosa , Pietro Liguori

GitBugs: Bug Reports for Duplicate Detection, Retrieval Augmented Generation, Triage, and More

Bug reports provide critical insights into software quality, yet existing datasets often suffer from limited scope, outdated content, or insufficient metadata for machine learning. To address these limitations, we present GitBugs-a…

Software Engineering · Computer Science 2026-04-30 Avinash Patil , Siru Tao , Aryan Jadon

Bugs in the Shadows: Static Detection of Faulty Python Refactorings

Python is a widely adopted programming language, valued for its simplicity and flexibility. However, its dynamic type system poses significant challenges for automated refactoring - an essential practice in software evolution aimed at…

Software Engineering · Computer Science 2025-11-20 Jonhnanthan Oliveira , Rohit Gheyi , Márcio Ribeiro , Alessandro Garcia

Source Code Hotspots: A Diagnostic Method for Quality Issues

Software source code often harbours "hotspots": small portions of the code that change far more often than the rest of the project and thus concentrate maintenance activity. We mine the complete version histories of 91 evolving, actively…

Software Engineering · Computer Science 2026-02-16 Saleha Muzammil , Mughees Ur Rehman , Zoe Kotti , Diomidis Spinellis

A Data Set of Generalizable Python Code Change Patterns

Mining repetitive code changes from version control history is a common way of discovering unknown change patterns. Such change patterns can be used in code recommender systems or automated program repair techniques. While there are such…

Software Engineering · Computer Science 2023-04-12 Akalanka Galappaththi , Sarah Nadi

PyTy: Repairing Static Type Errors in Python

Gradual typing enables developers to annotate types of their own choosing, offering a flexible middle ground between no type annotations and a fully statically typed language. As more and more code bases get type-annotated, static type…

Software Engineering · Computer Science 2024-01-15 Yiu Wai Chow , Luca Di Grazia , Michael Pradel

Why are Some Bugs Non-Reproducible? An Empirical Investigation using Data Fusion

Software developers attempt to reproduce software bugs to understand their erroneous behaviours and to fix them. Unfortunately, they often fail to reproduce (or fix) them, which leads to faulty, unreliable software systems. However, to…

Software Engineering · Computer Science 2021-08-12 Mohammad Masudur Rahman , Foutse Khomh , Marco Castelluccio

Harvesting Fix Hints in the History of Bugs

In software development, fixing bugs is an important task that is time consuming and cost-sensitive. While many approaches have been proposed to automatically detect and patch software code, the strategies are limited to a set of identified…

Software Engineering · Computer Science 2015-07-22 Tegawendé F. Bissyandé

An Investigation of Hardware Security Bug Characteristics in Open-Source Projects

Hardware security is an important concern of system security as vulnerabilities can arise from design errors introduced throughout the development lifecycle. Recent works have proposed techniques to detect hardware security bugs, such as…

Cryptography and Security · Computer Science 2024-02-02 Joey Ah-kiow , Benjamin Tan

An Automatically Created Novel Bug Dataset and its Validation in Bug Prediction

Bugs are inescapable during software development due to frequent code changes, tight deadlines, etc.; therefore, it is important to have tools to find these errors. One way of performing bug identification is to analyze the characteristics…

Software Engineering · Computer Science 2020-06-19 Rudolf Ferenc , Péter Gyimesi , Gábor Gyimesi , Zoltán Tóth , Tibor Gyimóthy

The Impact Of Bug Localization Based on Crash Report Mining: A Developers' Perspective

Developers often use crash reports to understand the root cause of bugs. However, locating the buggy source code snippet from such information is a challenging task, mainly when the log database contains many crash reports. To mitigate this…

Software Engineering · Computer Science 2024-03-19 Marcos Medeiros , Uirá Kulesza , Roberta Coelho , Rodrigo Bonifácio , Christoph Treude , Eiji Adachi

Hot Fixing in the Wild

Despite the operational importance of hot fixes, large-scale evidence on how they reshape routine maintenance workflows, particularly in the era of autonomous coding agents, remains limited. We analyse hot fixes present in over 61,000…

Software Engineering · Computer Science 2026-04-30 Carol Hanna , Karine Even-Mendoza , W. B. Langdon , Mar Zamorano López , Justyna Petke , Federica Sarro

BUGSPHP: A dataset for Automated Program Repair in PHP

Automated Program Repair (APR) improves developer productivity by saving debugging and bug-fixing time. While APR has been extensively explored for C/C++ and Java programs, there is little research on bugs in PHP programs due to the lack of…

Software Engineering · Computer Science 2024-01-23 K. D. Pramod , W. T. N. De Silva , W. U. K. Thabrew , Ridwan Shariffdeen , Sandareka Wickramanayake

Analyzing the Context of Bug-Fixing Changes in the OpenStack Cloud Computing Platform

Many research areas in software engineering, such as mutation testing, automatic repair, fault localization, and fault injection, rely on empirical knowledge about recurring bug-fixing code changes. Previous studies in this field focus on…

Software Engineering · Computer Science 2019-08-30 Domenico Cotroneo , Luigi De Simone , Antonio Ken Iannillo , Roberto Natella , Stefano Rosiello , Nematollah Bidokhti

From Bugs to Benchmarks: A Comprehensive Survey of Software Defect Datasets

Software defect datasets, which are collections of software bugs, are essential resources to facilitate empirical research and enable standardized benchmarking for a wide range of software engineering techniques, including emerging areas…

Software Engineering · Computer Science 2026-02-12 Hao-Nan Zhu , Robert M. Furth , Michael Pradel , Cindy Rubio-González

Self-Supervised Bug Detection and Repair

Machine learning-based program analyses have recently shown the promise of integrating formal and probabilistic reasoning towards aiding software development. However, in the absence of large annotated corpora, training these analyses is…

Machine Learning · Computer Science 2021-11-17 Miltiadis Allamanis , Henry Jackson-Flux , Marc Brockschmidt

Towards Automated Performance Bug Identification in Python

Context: Software performance is a critical non-functional requirement, appearing in many fields such as mission critical applications, financial, and real time systems. In this work we focused on early detection of performance bugs; our…

Software Engineering · Computer Science 2017-02-28 Sokratis Tsakiltsidis , Andriy Miranskyy , Elie Mazzawi