Related papers: LLMorpheus: Mutation Testing using Large Language …

A Comprehensive Study on Large Language Models for Mutation Testing

Large Language Models (LLMs) have recently been used to generate mutants in both research work and in industrial practice. However, there has been no comprehensive empirical study of their performance for this increasingly important…

Software Engineering · Computer Science 2026-01-23 Bo Wang , Mingda Chen , Ming Deng , Youfang Lin , Mark Harman , Mike Papadakis , Jie M. Zhang

Mutation Testing via Iterative Large Language Model-Driven Scientific Debugging

Large Language Models (LLMs) can generate plausible test code. Intuitively they generate this by imitating tests seen in their training data, rather than reasoning about execution semantics. However, such reasoning is important when…

Software Engineering · Computer Science 2025-03-12 Philipp Straubinger , Marvin Kreis , Stephan Lukasczyk , Gordon Fraser

Benchmarking and Revisiting Code Generation Assessment: A Mutation-Based Approach

Code Large Language Models (CLLMs) have exhibited outstanding performance in program synthesis, attracting the focus of the research community. The evaluation of CLLM's program synthesis capability has generally relied on manually curated…

Software Engineering · Computer Science 2025-05-13 Longtian Wang , Tianlin Li , Xiaofei Xie , Yuhan Zhi , Jian Wang , Chao Shen

Effective Test Generation Using Pre-trained Large Language Models and Mutation Testing

One of the critical phases in software development is software testing. Testing helps with identifying potential bugs and reducing maintenance costs. The goal of automated test generation tools is to ease the development of tests by…

Software Engineering · Computer Science 2023-09-01 Arghavan Moradi Dakhel , Amin Nikanjam , Vahid Majdinasab , Foutse Khomh , Michel C. Desmarais

Mutation-based Consistency Testing for Evaluating the Code Understanding Capability of LLMs

Large Language Models (LLMs) have shown remarkable capabilities in processing both natural and programming languages, which have enabled various applications in software engineering, such as requirement engineering, code generation, and…

Software Engineering · Computer Science 2024-01-12 Ziyu Li , Donghwan Shin

SWE-Mutation: Can LLMs Generate Reliable Test Suites in Software Engineering?

Evaluating software engineering capabilities has become a core component of modern large language models (LLMs); however, the key bottleneck hindering further scaling lies not in the scarcity of high-quality solutions, but in the lack of…

Software Engineering · Computer Science 2026-05-22 Yuxuan Sun , Yuze Zhao , Yufeng Wang , Yao Du , Zhiyuan Ma , Jinbo Wang , Mengdi Zhang , Kai Zhang , Zhenya Huang

What Are We Really Testing in Mutation Testing for Machine Learning? A Critical Reflection

Mutation testing is a well-established technique for assessing a test suite's quality by injecting artificial faults into production code. In recent years, mutation testing has been extended to machine learning (ML) systems, and deep…

Software Engineering · Computer Science 2021-03-03 Annibale Panichella , Cynthia C. S. Liem

Does mutation testing improve testing practices?

Various proxy metrics for test quality have been defined in order to guide developers when writing tests. Code coverage is particularly well established in practice, even though the question of how coverage relates to test quality is a…

Software Engineering · Computer Science 2021-03-15 Goran Petrović , Marko Ivanković , Gordon Fraser , René Just

Large Language Models are Few-shot Testers: Exploring LLM-based General Bug Reproduction

Many automated test generation techniques have been developed to aid developers with writing tests. To facilitate full automation, most existing techniques aim to either increase coverage, or generate exploratory inputs. However, existing…

Software Engineering · Computer Science 2023-07-26 Sungmin Kang , Juyeon Yoon , Shin Yoo

Learning How to Mutate Source Code from Bug-Fixes

Mutation testing has been widely accepted as an approach to guide test case generation or to assess the effectiveness of test suites. Empirical studies have shown that mutants are representative of real faults; yet they also indicated a…

Software Engineering · Computer Science 2019-07-31 Michele Tufano , Cody Watson , Gabriele Bavota , Massimiliano Di Penta , Martin White , Denys Poshyvanyk

Evaluation and Improvement of Fault Detection for Large Language Models

Large language models (LLMs) have recently achieved significant success across various application domains, garnering substantial attention from different communities. Unfortunately, even for the best LLM, many \textit{faults} still exist…

Software Engineering · Computer Science 2024-11-06 Qiang Hu , Jin Wen , Maxime Cordy , Yuheng Huang , Wei Ma , Xiaofei Xie , Lei Ma

Assessing the Impact of Code Changes on the Fault Localizability of Large Language Models

Generative Large Language Models (LLMs) are increasingly used in non-generative software maintenance tasks, such as fault localization (FL). Success in FL depends on a models ability to reason about program semantics beyond surface-level…

Software Engineering · Computer Science 2026-03-06 Sabaat Haroon , Ahmad Faraz Khan , Ahmad Humayun , Waris Gill , Abdul Haddi Amjad , Ali R. Butt , Mohammad Taha Khan , Muhammad Ali Gulzar

Large Language Models for Software Testing: A Research Roadmap

Large Language Models (LLMs) are starting to be profiled as one of the most significant disruptions in the Software Testing field. Specifically, they have been successfully applied in software testing tasks such as generating test code, or…

Software Engineering · Computer Science 2025-09-30 Cristian Augusto , Antonia Bertolino , Guglielmo De Angelis , Francesca Lonetti , Jesús Morán

Mull it over: mutation testing based on LLVM

This paper describes Mull, an open-source tool for mutation testing based on the LLVM framework. Mull works with LLVM IR, a low-level intermediate representation, to perform mutations, and uses LLVM JIT for just-in-time compilation. This…

Software Engineering · Computer Science 2019-08-06 Alex Denisov , Stanislav Pankevich

Practical Mutation Testing at Scale

Mutation analysis assesses a test suite's adequacy by measuring its ability to detect small artificial faults, systematically seeded into the tested program. Mutation analysis is considered one of the strongest test-adequacy criteria.…

Software Engineering · Computer Science 2021-03-01 Goran Petrović , Marko Ivanković , Gordon Fraser , René Just

MMT: Mutation Testing of Java Bytecode with Model Transformation -- An Illustrative Demonstration

Mutation testing is an approach to check the robustness of test suites. The program code is slightly changed by mutations to inject errors. A test suite is robust enough if it finds such errors. Tools for mutation testing usually integrate…

Software Engineering · Computer Science 2024-04-23 Christoph Bockisch , Gabriele Taentzer , Daniel Neufeld

Mutation-Guided Unit Test Generation with a Large Language Model

Unit tests play a vital role in uncovering potential faults in software. While tools like EvoSuite focus on maximizing code coverage, recent advances in large language models (LLMs) have shifted attention toward LLM-based test generation.…

Software Engineering · Computer Science 2026-04-17 Guancheng Wang , Qinghua Xu , Lionel Briand , Kui Liu

Evaluating Diverse Large Language Models for Automatic and General Bug Reproduction

Bug reproduction is a critical developer activity that is also challenging to automate, as bug reports are often in natural language and thus can be difficult to transform to test cases consistently. As a result, existing techniques mostly…

Software Engineering · Computer Science 2023-11-10 Sungmin Kang , Juyeon Yoon , Nargiz Askarbekkyzy , Shin Yoo

Challenges in Testing Large Language Model Based Software: A Faceted Taxonomy

Large Language Models (LLMs) and Multi-Agent LLMs (MALLMs) introduce non-determinism unlike traditional or machine learning software, requiring new approaches to verifying correctness beyond simple output comparisons or statistical accuracy…

Software Engineering · Computer Science 2025-10-22 Felix Dobslaw , Robert Feldt , Juyeon Yoon , Shin Yoo

Are We Testing or Being Tested? Exploring the Practical Applications of Large Language Models in Software Testing

A Large Language Model (LLM) represents a cutting-edge artificial intelligence model that generates coherent content, including grammatically precise sentences, human-like paragraphs, and syntactically accurate code snippets. LLMs can play…

Software Engineering · Computer Science 2023-12-11 Robson Santos , Italo Santos , Cleyton Magalhaes , Ronnie de Souza Santos