Related papers: Reproduction Test Generation for Java SWE Issues
A software engineering issue (SWE issue) is easier to resolve when accompanied by a reproduction test. Unfortunately, most issues do not come with functioning reproduction tests, so this paper explores how to generate them automatically.…
While there has been plenty of work on generating tests from existing code, there has been limited work on generating tests from issues. A correct test must validate the code patch that resolves the issue. This paper focuses on the scenario…
Software testing is crucial for ensuring the correctness and reliability of software systems. Automated generation of issue reproduction tests from natural language issue descriptions enhances developer productivity by simplifying root…
Bug-fix benchmarks are essential for evaluating methodologies in automatic program repair (APR) and fault localization (FL). However, existing benchmarks, exemplified by Defects4J, need to evolve to incorporate recent bug-fixes aligned with…
Identifying the root cause of a bug remains difficult for many developers because bug reports often lack a bug reproducing test case that reliably triggers the failure. Manually writing such test cases is time-consuming and requires…
GitHub issue resolving is a critical task in software engineering, recently gaining significant attention in both industry and academia. Within this task, SWE-bench has been released to evaluate issue resolving capabilities of large…
Software testing is one of the very important Quality Assurance (QA) components. A lot of researchers deal with the testing process in terms of tester motivation and how tests should or should not be written. However, it is not known from…
Rotten green tests are passing tests which have, at least, one assertion not executed. They give developers a false confidence. In this paper, we present, RTj, a framework that analyzes test cases from Java projects with the goal of…
Test suites in real-world projects are often large and achieve high code coverage, yet they remain insufficient for detecting all bugs. The abundance of unresolved issues in open-source project trackers highlights this gap. While regression…
Large repositories of source code for research tend to limit their utility to static analysis of the code, as they give no guarantees on whether the projects are compilable, much less runnable in any way. The immediate consequence of the…
Code generation models can help improve many common software tasks ranging from code completion to defect prediction. Most of the existing benchmarks for code generation LLMs focus on code authoring or code completion. Surprisingly, there…
Test-driven development (TDD) is the practice of writing tests first and coding later, and the proponents of TDD expound its numerous benefits. For instance, given an issue on a source code repository, tests can clarify the desired behavior…
Software developers often look for solutions to their code-level problems using the Stack Overflow Q&A website. To receive help, developers frequently submit questions containing sample code segments and the description of the programming…
In the research of automated program repair (APR), benchmark datasets consisting of known defects in combination with test suites that indicate the defects are of high importance. They allow for an evidence-based comparison of different APR…
Software reliability is a primary concern in the construction of software, and thus a fundamental component in the definition of software quality. Analyzing software reliability requires a specification of the intended behavior of the…
Benchmarks like SWE-bench have standardized the evaluation of Large Language Models (LLMs) on repository-level software engineering tasks. However, these efforts remain limited by manual curation, static datasets, and a focus on…
Benchmarks of bugs are essential to empirically evaluate automatic program repair tools. In this paper, we present Bears, a project for collecting and storing bugs into an extensible bug benchmark for automatic repair studies in Java. The…
Third-party dependency updates can cause a build to fail if the new dependency version introduces a change that is incompatible with the usage: this is called a breaking dependency update. Research on breaking dependency updates is active,…
Documenting the functionality of software units with code comments, e.g., Javadoc comments, is a common programmer best-practice in software engineering. This paper introduces a novel test generation technique that exploits the code-comment…
Various fields of science face a reproducibility crisis. For quantum software engineering as an emerging field, it is therefore imminent to focus on proper reproducibility engineering from the start. Yet the provision of reproduction…