English
Related papers

Related papers: TestForge: Feedback-Driven, Agentic Test Suite Gen…

200 papers

Large language models generate plausible code but cannot verify correctness. Existing multi-agent systems simulate execution or leave verification optional. We introduce execution-grounded verification as a first-class principle: every code…

Software Engineering · Computer Science 2026-04-16 Rajesh Kumar , Waqar Ali , Junaid Ahmed , Najma Imtiaz Ali , Shaban Usman

Intelligent assistants powered by Large Language Models (LLMs) can generate program and test code with high accuracy, boosting developers' and testers' productivity. However, there is a lack of studies exploring LLMs for testing Web APIs,…

Software Engineering · Computer Science 2024-09-09 André Pereira , Bruno Lima , João Pascoal Faria

Automated test generation is essential for software quality assurance, with coverage rate serving as a key metric to ensure thorough testing. Recent advancements in Large Language Models (LLMs) have shown promise in improving test…

Software Engineering · Computer Science 2026-02-26 WeiZhe Xu , Mengyu Liu , Fanxin Kong

Unit tests represent the most basic level of testing within the software testing lifecycle and are crucial to ensuring software correctness. Designing and creating unit tests is a costly and labor-intensive process that is ripe for…

Software Engineering · Computer Science 2025-07-31 Andrea Lops , Fedelucio Narducci , Azzurra Ragone , Michelantonio Trizio , Claudio Bartolini

This paper introduces UnitTenX, a state-of-the-art open-source AI multi-agent system designed to generate unit tests for legacy code, enhancing test coverage and critical value testing. UnitTenX leverages a combination of AI agents, formal…

Software Engineering · Computer Science 2025-10-08 Yiannis Charalambous , Claudionor N. Coelho , Luis Lamb , Lucas C. Cordeiro

The emergence of LLMs has catalyzed a paradigm shift in autonomous agent development, enabling systems capable of reasoning, planning, and executing complex multi-step tasks. However, existing agent frameworks often suffer from…

Artificial Intelligence · Computer Science 2026-01-21 Akbar Anbar Jafari , Cagri Ozcinar , Gholamreza Anbarjafari

The rapid evolution of Large Language Models (LLMs) has strongly impacted software engineering, leading to a growing number of studies on automated unit test generation. However, the standalone use of LLMs without post-processing has proven…

Software Engineering · Computer Science 2026-01-15 Michael Konstantinou , Renzo Degiovanni , Mike Papadakis

Code generation models can help improve many common software tasks ranging from code completion to defect prediction. Most of the existing benchmarks for code generation LLMs focus on code authoring or code completion. Surprisingly, there…

Software Engineering · Computer Science 2025-03-20 Kush Jain , Gabriel Synnaeve , Baptiste Rozière

LLM coding agents now generate code at an unprecedented scale, yet LLM-generated code introduces cybersecurity vulnerabilities into codebases without human involvement. Even when frontier models are explicitly asked to write secure…

Cryptography and Security · Computer Science 2026-05-12 Houjun Liu , Lisa Einstein , John Yang , Joachim Baumann , Duncan Eddy , Christopher D. Manning , Mykel Kochenderfer , Diyi Yang

Evaluating software engineering capabilities has become a core component of modern large language models (LLMs); however, the key bottleneck hindering further scaling lies not in the scarcity of high-quality solutions, but in the lack of…

Software Engineering · Computer Science 2026-05-22 Yuxuan Sun , Yuze Zhao , Yufeng Wang , Yao Du , Zhiyuan Ma , Jinbo Wang , Mengdi Zhang , Kai Zhang , Zhenya Huang

Evaluating outputs of large language models (LLMs) is challenging, requiring making -- and making sense of -- many responses. Yet tools that go beyond basic prompting tend to require knowledge of programming APIs, focus on narrow domains,…

Human-Computer Interaction · Computer Science 2024-05-07 Ian Arawjo , Chelse Swoopes , Priyan Vaithilingam , Martin Wattenberg , Elena Glassman

Unit tests play a vital role in uncovering potential faults in software. While tools like EvoSuite focus on maximizing code coverage, recent advances in large language models (LLMs) have shifted attention toward LLM-based test generation.…

Software Engineering · Computer Science 2026-04-17 Guancheng Wang , Qinghua Xu , Lionel Briand , Kui Liu

Existing LLM-based automatic test generation methods mainly produce input and expected output pairs to categorize the intended behavior of correct programs. Although straightforward, these methods have limited diversity in generated tests…

Software Engineering · Computer Science 2025-11-04 Yujian Liu , Jiabao Ji , Yang Zhang , Wenbo Guo , Tommi Jaakkola , Shiyu Chang

Generating tests automatically is a key and ongoing area of focus in software engineering research. The emergence of Large Language Models (LLMs) has opened up new opportunities, given their ability to perform a wide spectrum of tasks.…

Software Engineering · Computer Science 2025-01-20 Azat Abdullin , Pouria Derakhshanfar , Annibale Panichella

Large language models (LLMs) have recently achieved notable success in code-generation benchmarks such as HumanEval and LiveCodeBench. However, a detailed examination reveals that these evaluation suites often comprise only a limited number…

Computation and Language · Computer Science 2025-07-11 Zihan Ma , Taolin Zhang , Maosong Cao , Junnan Liu , Wenwei Zhang , Minnan Luo , Songyang Zhang , Kai Chen

In recent years, the application of behavioral testing in Natural Language Processing (NLP) model evaluation has experienced a remarkable and substantial growth. However, the existing methods continue to be restricted by the requirements…

Software Engineering · Computer Science 2025-03-10 Hengrui Xing , Cong Tian , Liang Zhao , Zhi Ma , WenSheng Wang , Nan Zhang , Chao Huang , Zhenhua Duan

Software testing has progressed toward intelligent automation, yet current AI-based test generators still suffer from static, single-shot outputs that frequently produce invalid, redundant, or non-executable tests due to the lack of…

Software Engineering · Computer Science 2026-01-07 Saba Naqvi , Mohammad Baqar , Nawaz Ali Mohammad

Automated unit test generation is critical for software quality but traditional structure-driven methods often lack the semantic understanding required to produce realistic inputs and oracles. Large language models (LLMs) address this…

Software Engineering · Computer Science 2026-01-01 Bei Chu , Yang Feng , Kui Liu , Zhaoqiang Guo , Yichi Zhang , Hange Shi , Zifan Nan , Baowen Xu

Traditional approaches to test case generation often involve manual effort and incur significant computational overhead. Additionally, these approaches are not scalable, and hence, unsuitable for complex software systems. Recently, Large…

Software Engineering · Computer Science 2026-05-05 Kushal Jasti , Tejamani Prashanth Sahu , Rishitha Pentyala , Muvvala Mohit , Vivek Yelleti

Historically, scientific discovery has been a lengthy and costly process, demanding substantial time and resources from initial conception to final results. To accelerate scientific discovery, reduce research costs, and improve research…

Human-Computer Interaction · Computer Science 2025-06-18 Samuel Schmidgall , Yusheng Su , Ze Wang , Ximeng Sun , Jialian Wu , Xiaodong Yu , Jiang Liu , Michael Moor , Zicheng Liu , Emad Barsoum
‹ Prev 1 2 3 10 Next ›