Related papers: $\mu$BERT: Mutation Testing using Pre-Trained Lang…

Efficient Mutation Testing via Pre-Trained Language Models

Mutation testing is an established fault-based testing technique. It operates by seeding faults into the programs under test and asking developers to write tests that reveal these faults. These tests have the potential to reveal a large…

Software Engineering · Computer Science 2023-01-10 Ahmed Khanfir , Renzo Degiovanni , Mike Papadakis , Yves Le Traon

Simulink Mutation Testing using CodeBERT

We present BERTiMuS, an approach that uses CodeBERT to generate mutants for Simulink models. BERTiMuS converts Simulink models into textual representations, masks tokens from the derived text, and uses CodeBERT to predict the masked tokens.…

Software Engineering · Computer Science 2025-01-14 Jingfan Zhang , Delaram Ghobari , Mehrdad Sabetzadeh , Shiva Nejati

Contextual Predictive Mutation Testing

Mutation testing is a powerful technique for assessing and improving test suite quality that artificially introduces bugs and checks whether the test suites catch them. However, it is also computationally expensive and thus does not scale…

Software Engineering · Computer Science 2023-09-06 Kush Jain , Uri Alon , Alex Groce , Claire Le Goues

Vulnerability Mimicking Mutants

With the increasing release of powerful language models trained on large code corpus (e.g. CodeBERT was trained on 6.4 million programs), a new family of mutation testing tools has arisen with the promise to generate more "natural" mutants…

Software Engineering · Computer Science 2023-03-09 Aayush Garg , Renzo Degiovanni , Mike Papadakis , Yves Le Traon

CodeBERT: A Pre-Trained Model for Programming and Natural Languages

We present CodeBERT, a bimodal pre-trained model for programming language (PL) and nat-ural language (NL). CodeBERT learns general-purpose representations that support downstream NL-PL applications such as natural language codesearch, code…

Computation and Language · Computer Science 2020-09-21 Zhangyin Feng , Daya Guo , Duyu Tang , Nan Duan , Xiaocheng Feng , Ming Gong , Linjun Shou , Bing Qin , Ting Liu , Daxin Jiang , Ming Zhou

SimpleBERT: A Pre-trained Model That Learns to Generate Simple Words

Pre-trained models are widely used in the tasks of natural language processing nowadays. However, in the specific field of text simplification, the research on improving pre-trained models is still blank. In this work, we propose a…

Computation and Language · Computer Science 2022-04-19 Renliang Sun , Xiaojun Wan

Learning How to Mutate Source Code from Bug-Fixes

Mutation testing has been widely accepted as an approach to guide test case generation or to assess the effectiveness of test suites. Empirical studies have shown that mutants are representative of real faults; yet they also indicated a…

Software Engineering · Computer Science 2019-07-31 Michele Tufano , Cody Watson , Gabriele Bavota , Massimiliano Di Penta , Martin White , Denys Poshyvanyk

DeepMutation: A Neural Mutation Tool

Mutation testing can be used to assess the fault-detection capabilities of a given test suite. To this aim, two characteristics of mutation testing frameworks are of paramount importance: (i) they should generate mutants that are…

Software Engineering · Computer Science 2020-02-14 Michele Tufano , Jason Kimko , Shiya Wang , Cody Watson , Gabriele Bavota , Massimiliano Di Penta , Denys Poshyvanyk

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Masked language modeling (MLM) pre-training methods such as BERT corrupt the input by replacing some tokens with [MASK] and then train a model to reconstruct the original tokens. While they produce good results when transferred to…

Computation and Language · Computer Science 2020-03-25 Kevin Clark , Minh-Thang Luong , Quoc V. Le , Christopher D. Manning

Mutation Testing via Iterative Large Language Model-Driven Scientific Debugging

Large Language Models (LLMs) can generate plausible test code. Intuitively they generate this by imitating tests seen in their training data, rather than reasoning about execution semantics. However, such reasoning is important when…

Software Engineering · Computer Science 2025-03-12 Philipp Straubinger , Marvin Kreis , Stephan Lukasczyk , Gordon Fraser

An Empirical Study of the Realism of Mutants in Deep Learning

Mutation analysis is a well-established technique for assessing test quality in the traditional software development paradigm by injecting artificial faults into programs. Its application to deep learning (DL) has expanded beyond classical…

Software Engineering · Computer Science 2025-12-19 Zaheed Ahmed , Philip Makedonski , Jens Grabowski

JavaBERT: Training a transformer-based model for the Java programming language

Code quality is and will be a crucial factor while developing new software code, requiring appropriate tools to ensure functional and reliable code. Machine learning techniques are still rarely used for software engineering tools, missing…

Software Engineering · Computer Science 2021-10-22 Nelson Tavares de Sousa , Wilhelm Hasselbring

Syntactic Vs. Semantic similarity of Artificial and Real Faults in Mutation Testing Studies

Fault seeding is typically used in controlled studies to evaluate and compare test techniques. Central to these techniques lies the hypothesis that artificially seeded faults involve some form of realistic properties and thus provide…

Software Engineering · Computer Science 2021-12-30 Milos Ojdanic , Aayush Garg , Ahmed Khanfir , Renzo Degiovanni , Mike Papadakis , Yves Le Traon

PITMuS: A Tool for Automated Bug Dataset Generation via Source-Level Mutant Reconstruction

LLM-based software engineering increasingly depends on executable, context-rich bug artifacts: paired correct and buggy code, methods under test (MUTs), documentation, and metadata. These artifacts support the training and evaluation of…

Software Engineering · Computer Science 2026-05-22 Tasfia Tasnim , Soneya Binta Hossain

Mutation-Guided Unit Test Generation with a Large Language Model

Unit tests play a vital role in uncovering potential faults in software. While tools like EvoSuite focus on maximizing code coverage, recent advances in large language models (LLMs) have shifted attention toward LLM-based test generation.…

Software Engineering · Computer Science 2026-04-17 Guancheng Wang , Qinghua Xu , Lionel Briand , Kui Liu

Predictive Mutation Analysis via Natural Language Channel in Source Code

Mutation analysis can provide valuable insights into both System Under Test (SUT) and its test suite. However, it is not scalable due to the cost of building and testing a large number of mutants. Predictive Mutation Testing (PMT) has been…

Software Engineering · Computer Science 2022-09-15 Jinhan Kim , Juyoung Jeon , Shin Hong , Shin Yoo

ContraBERT: Enhancing Code Pre-trained Models via Contrastive Learning

Large-scale pre-trained models such as CodeBERT, GraphCodeBERT have earned widespread attention from both academia and industry. Attributed to the superior ability in code representation, they have been further applied in multiple…

Software Engineering · Computer Science 2023-01-24 Shangqing Liu , Bozhi Wu , Xiaofei Xie , Guozhu Meng , Yang Liu

MC-BERT: Efficient Language Pre-Training via a Meta Controller

Pre-trained contextual representations (e.g., BERT) have become the foundation to achieve state-of-the-art results on many NLP tasks. However, large-scale pre-training is computationally expensive. ELECTRA, an early attempt to accelerate…

Computation and Language · Computer Science 2020-06-17 Zhenhui Xu , Linyuan Gong , Guolin Ke , Di He , Shuxin Zheng , Liwei Wang , Jiang Bian , Tie-Yan Liu

AMBERT: A Pre-trained Language Model with Multi-Grained Tokenization

Pre-trained language models such as BERT have exhibited remarkable performances in many tasks in natural language understanding (NLU). The tokens in the models are usually fine-grained in the sense that for languages like English they are…

Computation and Language · Computer Science 2021-05-28 Xinsong Zhang , Pengshuai Li , Hang Li

A Comprehensive Study on Large Language Models for Mutation Testing

Large Language Models (LLMs) have recently been used to generate mutants in both research work and in industrial practice. However, there has been no comprehensive empirical study of their performance for this increasingly important…

Software Engineering · Computer Science 2026-01-23 Bo Wang , Mingda Chen , Ming Deng , Youfang Lin , Mark Harman , Mike Papadakis , Jie M. Zhang