English
Related papers

Related papers: Tests4Py: A Benchmark for System Testing

200 papers

The 2019 edition of Stack Overflow developer survey highlights that, for the first time, Python outperformed Java in terms of popularity. The gap between Python and Java further widened in the 2020 edition of the survey. Unfortunately,…

Realistic benchmarks of reproducible bugs and fixes are vital to good experimental evaluation of debugging and testing approaches. However, there is no suitable benchmark suite that can systematically evaluate the debugging and testing…

Software Engineering · Computer Science 2021-09-22 Pengzhan Zhao , Jianjun Zhao , Zhongtao Miao , Shuhan Lan

The rapid escalation of applying Machine Learning (ML) in various domains has led to paying more attention to the quality of ML components. There is then a growth of techniques and tools aiming at improving the quality of ML components and…

Software Engineering · Computer Science 2023-01-18 Mohammad Mehdi Morovati , Amin Nikanjam , Foutse Khomh , Zhen Ming , Jiang

Datasets such as Defects4J and BugsInPy that contain bugs from real-world software projects are necessary for a realistic evaluation of automated debugging tools. However these datasets largely identify only a single bug in each entry,…

Software Engineering · Computer Science 2024-04-11 Dylan Callaghan , Bernd Fischer

Benchmarks play an important role in evaluating the efficiency and effectiveness of solutions to automate several phases of the software development lifecycle. Moreover, if well designed, they also serve us well as an important artifact to…

Software Engineering · Computer Science 2019-05-24 Thomas Durieux , Rui Abreu

Performance bugs are inefficiencies in software that waste computational resources without causing functional failures, making them particularly challenging to detect and fix. While recent advances in Software Engineering agents have shown…

Software Engineering · Computer Science 2025-12-04 Spandan Garg , Roshanak Zilouchian Moghaddam , Neel Sundaresan

Developers create bug-reproducing tests that support debugging by failing as long as the bug is present, and passing once the bug has been fixed. These tests are usually integrated into existing test suites and executed regularly alongside…

Software Engineering · Computer Science 2026-02-04 Andre Hora , Gordon Fraser

Software defect datasets, which are collections of software bugs, are essential resources to facilitate empirical research and enable standardized benchmarking for a wide range of software engineering techniques, including emerging areas…

Software Engineering · Computer Science 2026-02-12 Hao-Nan Zhu , Robert M. Furth , Michael Pradel , Cindy Rubio-González

In the past decade, research on test-suite-based automatic program repair has grown significantly. Each year, new approaches and implementations are featured in major software engineering venues. However, most of those approaches are…

Software Engineering · Computer Science 2019-05-29 Thomas Durieux , Fernanda Madeiral , Matias Martinez , Rui Abreu

Bug-fix benchmarks are essential for evaluating methodologies in automatic program repair (APR) and fault localization (FL). However, existing benchmarks, exemplified by Defects4J, need to evolve to incorporate recent bug-fixes aligned with…

Software Engineering · Computer Science 2024-11-04 André Silva , Nuno Saavedra , Martin Monperrus

Large language models (LLMs) have demonstrated strong performance on a wide range of software engineering tasks, including code generation and analysis. However, most prior work relies on cloud-based models or specialized hardware, limiting…

Software Engineering · Computer Science 2026-04-28 Jelena Ilić Vulićević

Reproducibility and comparability of empirical results are at the core tenet of the scientific method in any scientific field. To ease reproducibility of empirical studies, several benchmarks in software engineering research, such as…

Software Engineering · Computer Science 2021-04-01 José Campos , André Souto

Software auditing is an increasingly critical task in the era of rapid code generation. While LLM-based auditors have demonstrated strong potential, their effectiveness remains limited by misalignment with the highly complex,…

Software Engineering · Computer Science 2026-04-16 Jinyao Guo , Chengpeng Wang , Dominic Deluca , Jinjie Liu , Zhuo Zhang , Xiangyu Zhang

Large Language Models (LLMs) have demonstrated exceptional coding capability. However, as another critical component of programming proficiency, the debugging capability of LLMs remains relatively unexplored. Previous evaluations of LLMs'…

Software Engineering · Computer Science 2024-06-07 Runchu Tian , Yining Ye , Yujia Qin , Xin Cong , Yankai Lin , Yinxu Pan , Yesai Wu , Haotian Hui , Weichuan Liu , Zhiyuan Liu , Maosong Sun

Efficient bug triaging procedures are an important precondition for successful collaborative software engineering projects. Triaging bugs can become a laborious task particularly in open source software (OSS) projects with a large base of…

Software Engineering · Computer Science 2013-03-04 Marcelo Serrano Zanetti , Ingo Scholtes , Claudio Juan Tessone , Frank Schweitzer

As the adoption of Deep Learning (DL) systems continues to rise, an increasing number of approaches are being proposed to test these systems, localise faults within them, and repair those faults. The best attestation of effectiveness for…

Software Engineering · Computer Science 2024-12-24 Gunel Jahangirova , Nargiz Humbatova , Jinhan Kim , Shin Yoo , Paolo Tonella

Context: Software performance is a critical non-functional requirement, appearing in many fields such as mission critical applications, financial, and real time systems. In this work we focused on early detection of performance bugs; our…

Software Engineering · Computer Science 2017-02-28 Sokratis Tsakiltsidis , Andriy Miranskyy , Elie Mazzawi

Testing plays a crucial role in the software development cycle, enabling the detection of bugs, vulnerabilities, and other undesirable behaviors. To perform software testing, testers need to write code snippets that execute the program…

Software Engineering · Computer Science 2025-02-04 Wenhan Wang , Chenyuan Yang , Zhijie Wang , Yuheng Huang , Zhaoyang Chu , Da Song , Lingming Zhang , An Ran Chen , Lei Ma

Software bugs significantly contribute to software cost and increase the risk of system malfunctioning. In recent years, many automated program-repair approaches have been proposed to automatically fix undesired program behavior. Despite of…

Software Engineering · Computer Science 2021-07-19 Dirk Beyer , Lars Grunske , Thomas Lemberger , Minxing Tang

Software is used in critical applications in our day-to-day life and it is important to ensure its correctness. One popular approach to assess correctness is to evaluate software on tests. If a test fails, it indicates a fault in the…

Software Engineering · Computer Science 2025-04-01 Max Hort , Leon Moonen
‹ Prev 1 2 3 10 Next ›