Related papers: LLMorpheus: Mutation Testing using Large Language …
Large Language Models (LLMs) have recently been used to generate mutants in both research work and in industrial practice. However, there has been no comprehensive empirical study of their performance for this increasingly important…
Large Language Models (LLMs) can generate plausible test code. Intuitively they generate this by imitating tests seen in their training data, rather than reasoning about execution semantics. However, such reasoning is important when…
Code Large Language Models (CLLMs) have exhibited outstanding performance in program synthesis, attracting the focus of the research community. The evaluation of CLLM's program synthesis capability has generally relied on manually curated…
One of the critical phases in software development is software testing. Testing helps with identifying potential bugs and reducing maintenance costs. The goal of automated test generation tools is to ease the development of tests by…
Large Language Models (LLMs) have shown remarkable capabilities in processing both natural and programming languages, which have enabled various applications in software engineering, such as requirement engineering, code generation, and…
Evaluating software engineering capabilities has become a core component of modern large language models (LLMs); however, the key bottleneck hindering further scaling lies not in the scarcity of high-quality solutions, but in the lack of…
Mutation testing is a well-established technique for assessing a test suite's quality by injecting artificial faults into production code. In recent years, mutation testing has been extended to machine learning (ML) systems, and deep…
Various proxy metrics for test quality have been defined in order to guide developers when writing tests. Code coverage is particularly well established in practice, even though the question of how coverage relates to test quality is a…
Many automated test generation techniques have been developed to aid developers with writing tests. To facilitate full automation, most existing techniques aim to either increase coverage, or generate exploratory inputs. However, existing…
Mutation testing has been widely accepted as an approach to guide test case generation or to assess the effectiveness of test suites. Empirical studies have shown that mutants are representative of real faults; yet they also indicated a…
Large language models (LLMs) have recently achieved significant success across various application domains, garnering substantial attention from different communities. Unfortunately, even for the best LLM, many \textit{faults} still exist…
Generative Large Language Models (LLMs) are increasingly used in non-generative software maintenance tasks, such as fault localization (FL). Success in FL depends on a models ability to reason about program semantics beyond surface-level…
Large Language Models (LLMs) are starting to be profiled as one of the most significant disruptions in the Software Testing field. Specifically, they have been successfully applied in software testing tasks such as generating test code, or…
This paper describes Mull, an open-source tool for mutation testing based on the LLVM framework. Mull works with LLVM IR, a low-level intermediate representation, to perform mutations, and uses LLVM JIT for just-in-time compilation. This…
Mutation analysis assesses a test suite's adequacy by measuring its ability to detect small artificial faults, systematically seeded into the tested program. Mutation analysis is considered one of the strongest test-adequacy criteria.…
Mutation testing is an approach to check the robustness of test suites. The program code is slightly changed by mutations to inject errors. A test suite is robust enough if it finds such errors. Tools for mutation testing usually integrate…
Unit tests play a vital role in uncovering potential faults in software. While tools like EvoSuite focus on maximizing code coverage, recent advances in large language models (LLMs) have shifted attention toward LLM-based test generation.…
Bug reproduction is a critical developer activity that is also challenging to automate, as bug reports are often in natural language and thus can be difficult to transform to test cases consistently. As a result, existing techniques mostly…
Large Language Models (LLMs) and Multi-Agent LLMs (MALLMs) introduce non-determinism unlike traditional or machine learning software, requiring new approaches to verifying correctness beyond simple output comparisons or statistical accuracy…
A Large Language Model (LLM) represents a cutting-edge artificial intelligence model that generates coherent content, including grammatically precise sentences, human-like paragraphs, and syntactically accurate code snippets. LLMs can play…