Related papers: TestForge: Feedback-Driven, Agentic Test Suite Gen…

AgentForge: Execution-Grounded Multi-Agent LLM Framework for Autonomous Software Engineering

Large language models generate plausible code but cannot verify correctness. Existing multi-agent systems simulate execution or leave verification optional. We introduce execution-grounded verification as a first-class principle: every code…

Software Engineering · Computer Science 2026-04-16 Rajesh Kumar , Waqar Ali , Junaid Ahmed , Najma Imtiaz Ali , Shaban Usman

APITestGenie: Automated API Test Generation through Generative AI

Intelligent assistants powered by Large Language Models (LLMs) can generate program and test code with high accuracy, boosting developers' and testers' productivity. However, there is a lack of studies exploring LLMs for testing Web APIs,…

Software Engineering · Computer Science 2024-09-09 André Pereira , Bruno Lima , João Pascoal Faria

Enhancing LLM-Based Test Generation by Eliminating Covered Code

Automated test generation is essential for software quality assurance, with coverage rate serving as a key metric to ensure thorough testing. Recent advancements in Large Language Models (LLMs) have shown promise in improving test…

Software Engineering · Computer Science 2026-02-26 WeiZhe Xu , Mengyu Liu , Fanxin Kong

A System for Automated Unit Test Generation Using Large Language Models and Assessment of Generated Test Suites

Unit tests represent the most basic level of testing within the software testing lifecycle and are crucial to ensuring software correctness. Designing and creating unit tests is a costly and labor-intensive process that is ripe for…

Software Engineering · Computer Science 2025-07-31 Andrea Lops , Fedelucio Narducci , Azzurra Ragone , Michelantonio Trizio , Claudio Bartolini

UnitTenX: Generating Tests for Legacy Packages with AI Agents Powered by Formal Verification

This paper introduces UnitTenX, a state-of-the-art open-source AI multi-agent system designed to generate unit tests for legacy code, enhancing test coverage and critical value testing. UnitTenX leverages a combination of AI agents, formal…

Software Engineering · Computer Science 2025-10-08 Yiannis Charalambous , Claudionor N. Coelho , Luis Lamb , Lucas C. Cordeiro

A Lightweight Modular Framework for Constructing Autonomous Agents Driven by Large Language Models: Design, Implementation, and Applications in AgentForge

The emergence of LLMs has catalyzed a paradigm shift in autonomous agent development, enabling systems capable of reasoning, planning, and executing complex multi-step tasks. However, existing agent frameworks often suffer from…

Artificial Intelligence · Computer Science 2026-01-21 Akbar Anbar Jafari , Cagri Ozcinar , Gholamreza Anbarjafari

How well LLM-based test generation techniques perform with newer LLM versions?

The rapid evolution of Large Language Models (LLMs) has strongly impacted software engineering, leading to a growing number of studies on automated unit test generation. However, the standalone use of LLMs without post-processing has proven…

Software Engineering · Computer Science 2026-01-15 Michael Konstantinou , Renzo Degiovanni , Mike Papadakis

TestGenEval: A Real World Unit Test Generation and Test Completion Benchmark

Code generation models can help improve many common software tasks ranging from code completion to defect prediction. Most of the existing benchmarks for code generation LLMs focus on code authoring or code completion. Surprisingly, there…

Software Engineering · Computer Science 2025-03-20 Kush Jain , Gabriel Synnaeve , Baptiste Rozière

SecureForge: Finding and Preventing Vulnerabilities in LLM-Generated Code via Prompt Optimization

LLM coding agents now generate code at an unprecedented scale, yet LLM-generated code introduces cybersecurity vulnerabilities into codebases without human involvement. Even when frontier models are explicitly asked to write secure…

Cryptography and Security · Computer Science 2026-05-12 Houjun Liu , Lisa Einstein , John Yang , Joachim Baumann , Duncan Eddy , Christopher D. Manning , Mykel Kochenderfer , Diyi Yang

SWE-Mutation: Can LLMs Generate Reliable Test Suites in Software Engineering?

Evaluating software engineering capabilities has become a core component of modern large language models (LLMs); however, the key bottleneck hindering further scaling lies not in the scarcity of high-quality solutions, but in the lack of…

Software Engineering · Computer Science 2026-05-22 Yuxuan Sun , Yuze Zhao , Yufeng Wang , Yao Du , Zhiyuan Ma , Jinbo Wang , Mengdi Zhang , Kai Zhang , Zhenya Huang

ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing

Evaluating outputs of large language models (LLMs) is challenging, requiring making -- and making sense of -- many responses. Yet tools that go beyond basic prompting tend to require knowledge of programming APIs, focus on narrow domains,…

Human-Computer Interaction · Computer Science 2024-05-07 Ian Arawjo , Chelse Swoopes , Priyan Vaithilingam , Martin Wattenberg , Elena Glassman

Mutation-Guided Unit Test Generation with a Large Language Model

Unit tests play a vital role in uncovering potential faults in software. While tools like EvoSuite focus on maximizing code coverage, recent advances in large language models (LLMs) have shifted attention toward LLM-based test generation.…

Software Engineering · Computer Science 2026-04-17 Guancheng Wang , Qinghua Xu , Lionel Briand , Kui Liu

HarnessLLM: Automatic Testing Harness Generation via Reinforcement Learning

Existing LLM-based automatic test generation methods mainly produce input and expected output pairs to categorize the intended behavior of correct programs. Although straightforward, these methods have limited diversity in generated tests…

Software Engineering · Computer Science 2025-11-04 Yujian Liu , Jiabao Ji , Yang Zhang , Wenbo Guo , Tommi Jaakkola , Shiyu Chang

Test Wars: A Comparative Study of SBST, Symbolic Execution, and LLM-Based Approaches to Unit Test Generation

Generating tests automatically is a key and ongoing area of focus in software engineering research. The emergence of Large Language Models (LLMs) has opened up new opportunities, given their ability to perform a wide spectrum of tasks.…

Software Engineering · Computer Science 2025-01-20 Azat Abdullin , Pouria Derakhshanfar , Annibale Panichella

Rethinking Verification for LLM Code Generation: From Generation to Testing

Large language models (LLMs) have recently achieved notable success in code-generation benchmarks such as HumanEval and LiveCodeBench. However, a detailed examination reveals that these evaluation suites often comprise only a limited number…

Computation and Language · Computer Science 2025-07-11 Zihan Ma , Taolin Zhang , Maosong Cao , Junnan Liu , Wenwei Zhang , Minnan Luo , Songyang Zhang , Kai Chen

AutoTestForge: A Multidimensional Automated Testing Framework for Natural Language Processing Models

In recent years, the application of behavioral testing in Natural Language Processing (NLP) model evaluation has experienced a remarkable and substantial growth. However, the existing methods continue to be restricted by the requirements…

Software Engineering · Computer Science 2025-03-10 Hengrui Xing , Cong Tian , Liang Zhao , Zhi Ma , WenSheng Wang , Nan Zhang , Chao Huang , Zhenhua Duan

The Rise of Agentic Testing: Multi-Agent Systems for Robust Software Quality Assurance

Software testing has progressed toward intelligent automation, yet current AI-based test generators still suffer from static, single-shot outputs that frequently produce invalid, redundant, or non-executable tests due to the lack of…

Software Engineering · Computer Science 2026-01-07 Saba Naqvi , Mohammad Baqar , Nawaz Ali Mohammad

Large Language Models for Unit Test Generation: Achievements, Challenges, and Opportunities

Automated unit test generation is critical for software quality but traditional structure-driven methods often lack the semantic understanding required to produce realistic inputs and oracles. Large language models (LLMs) address this…

Software Engineering · Computer Science 2026-01-01 Bei Chu , Yang Feng , Kui Liu , Zhaoqiang Guo , Yichi Zhang , Hange Shi , Zifan Nan , Baowen Xu

FeedbackLLM: Metadata driven Multi-Agentic Language Agnostic Test Case Generator with Evolving prompt and Coverage Feedback

Traditional approaches to test case generation often involve manual effort and incur significant computational overhead. Additionally, these approaches are not scalable, and hence, unsuitable for complex software systems. Recently, Large…

Software Engineering · Computer Science 2026-05-05 Kushal Jasti , Tejamani Prashanth Sahu , Rishitha Pentyala , Muvvala Mohit , Vivek Yelleti

Agent Laboratory: Using LLM Agents as Research Assistants

Historically, scientific discovery has been a lengthy and costly process, demanding substantial time and resources from initial conception to final results. To accelerate scientific discovery, reduce research costs, and improve research…

Human-Computer Interaction · Computer Science 2025-06-18 Samuel Schmidgall , Yusheng Su , Ze Wang , Ximeng Sun , Jialian Wu , Xiaodong Yu , Jiang Liu , Michael Moor , Zicheng Liu , Emad Barsoum