Related papers: TESTEVAL: Benchmarking Large Language Models for T…

TestBench: Evaluating Class-Level Test Case Generation Capability of Large Language Models

Software testing is a crucial phase in the software life cycle, helping identify potential risks and reduce maintenance costs. With the advancement of Large Language Models (LLMs), researchers have proposed an increasing number of LLM-based…

Software Engineering · Computer Science 2024-09-27 Quanjun Zhang , Ye Shang , Chunrong Fang , Siqi Gu , Jianyi Zhou , Zhenyu Chen

TestGenEval: A Real World Unit Test Generation and Test Completion Benchmark

Code generation models can help improve many common software tasks ranging from code completion to defect prediction. Most of the existing benchmarks for code generation LLMs focus on code authoring or code completion. Surprisingly, there…

Software Engineering · Computer Science 2025-03-20 Kush Jain , Gabriel Synnaeve , Baptiste Rozière

McEval: Massively Multilingual Code Evaluation

Code large language models (LLMs) have shown remarkable advances in code understanding, completion, and generation tasks. Programming benchmarks, comprised of a selection of code challenges and corresponding test cases, serve as a standard…

Programming Languages · Computer Science 2024-06-12 Linzheng Chai , Shukai Liu , Jian Yang , Yuwei Yin , Ke Jin , Jiaheng Liu , Tao Sun , Ge Zhang , Changyu Ren , Hongcheng Guo , Zekun Wang , Boyang Wang , Xianjie Wu , Bing Wang , Tongliang Li , Liqun Yang , Sufeng Duan , Zhoujun Li

Can LLMs Generate Reliable Test Case Generators? A Study on Competition-Level Programming Problems

Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation, capable of tackling complex tasks during inference. However, the extent to which LLMs can be utilized for code checking or debugging through test…

Computation and Language · Computer Science 2026-01-15 Yuhan Cao , Zian Chen , Kun Quan , Ziliang Zhang , Yu Wang , Xiaoning Dong , Yeqi Feng , Guanzhong He , Jingcheng Huang , Jianhao Li , Yixuan Tan , Jiafu Tang , Yilin Tang , Junlei Wu , Qianyu Xiao , Can Zheng , Shouchen Zhou , Yuxiang Zhu , Yiming Huang , Tianxing He

Large Language Models as Test Case Generators: Performance Evaluation and Enhancement

Code generation with Large Language Models (LLMs) has been extensively studied and achieved remarkable progress. As a complementary aspect to code generation, test case generation is of crucial importance in ensuring the quality and…

Software Engineering · Computer Science 2024-04-23 Kefan Li , Yuan Yuan

Large Language Models for Software Testing: A Research Roadmap

Large Language Models (LLMs) are starting to be profiled as one of the most significant disruptions in the Software Testing field. Specifically, they have been successfully applied in software testing tasks such as generating test code, or…

Software Engineering · Computer Science 2025-09-30 Cristian Augusto , Antonia Bertolino , Guglielmo De Angelis , Francesca Lonetti , Jesús Morán

L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models

Recently, large language models (LLMs), especially those that are pretrained on code, have demonstrated strong capabilities in generating programs from natural language inputs in a few-shot or even zero-shot manner. Despite promising…

Computation and Language · Computer Science 2023-10-03 Ansong Ni , Pengcheng Yin , Yilun Zhao , Martin Riddell , Troy Feng , Rui Shen , Stephen Yin , Ye Liu , Semih Yavuz , Caiming Xiong , Shafiq Joty , Yingbo Zhou , Dragomir Radev , Arman Cohan

A Case Study on Test Case Construction with Large Language Models: Unveiling Practical Insights and Challenges

This paper presents a detailed case study examining the application of Large Language Models (LLMs) in the construction of test cases within the context of software engineering. LLMs, characterized by their advanced natural language…

Software Engineering · Computer Science 2023-12-25 Roberto Francisco de Lima Junior , Luiz Fernando Paes de Barros Presta , Lucca Santos Borborema , Vanderson Nogueira da Silva , Marcio Leal de Melo Dahia , Anderson Carlos Sousa e Santos

Prompting Large Language Models to Tackle the Full Software Development Lifecycle: A Case Study

Recent advancements in large language models (LLMs) have significantly enhanced their coding capabilities. However, existing benchmarks predominantly focused on simplified or isolated aspects of coding, such as single-file code generation…

Computation and Language · Computer Science 2024-12-17 Bowen Li , Wenhan Wu , Ziwei Tang , Lin Shi , John Yang , Jinyang Li , Shunyu Yao , Chen Qian , Binyuan Hui , Qicheng Zhang , Zhiyin Yu , He Du , Ping Yang , Dahua Lin , Chao Peng , Kai Chen

CodeEval: A pedagogical approach for targeted evaluation of code-trained Large Language Models

Large Language Models (LLMs) are predominantly assessed based on their common sense reasoning, language comprehension, and logical reasoning abilities. While models trained in specialized domains like mathematics or coding have demonstrated…

Software Engineering · Computer Science 2026-01-08 Danny Brahman , Mohammad Mahoor

Are We Testing or Being Tested? Exploring the Practical Applications of Large Language Models in Software Testing

A Large Language Model (LLM) represents a cutting-edge artificial intelligence model that generates coherent content, including grammatically precise sentences, human-like paragraphs, and syntactically accurate code snippets. LLMs can play…

Software Engineering · Computer Science 2023-12-11 Robson Santos , Italo Santos , Cleyton Magalhaes , Ronnie de Souza Santos

DevEval: Evaluating Code Generation in Practical Software Projects

How to evaluate Large Language Models (LLMs) in code generation is an open question. Many benchmarks have been proposed but are inconsistent with practical software projects, e.g., unreal program distributions, insufficient dependencies,…

Software Engineering · Computer Science 2024-03-07 Jia Li , Ge Li , Yunfei Zhao , Yongmin Li , Zhi Jin , Hao Zhu , Huanyu Liu , Kaibo Liu , Lecheng Wang , Zheng Fang , Lanshen Wang , Jiazheng Ding , Xuanming Zhang , Yihong Dong , Yuqi Zhu , Bin Gu , Mengfei Yang

Software Testing with Large Language Models: Survey, Landscape, and Vision

Pre-trained large language models (LLMs) have recently emerged as a breakthrough technology in natural language processing and artificial intelligence, with the ability to handle large-scale datasets and exhibit remarkable performance…

Software Engineering · Computer Science 2024-03-05 Junjie Wang , Yuchao Huang , Chunyang Chen , Zhe Liu , Song Wang , Qing Wang

Can LLMs Generate High-Quality Test Cases for Algorithm Problems? TestCase-Eval: A Systematic Evaluation of Fault Coverage and Exposure

We introduce TestCase-Eval, a new benchmark for systematic evaluation of LLMs in test-case generation. TestCase-Eval includes 500 algorithm problems and 100,000 human-crafted solutions from the Codeforces platform. It focuses on two pivotal…

Software Engineering · Computer Science 2025-06-17 Zheyuan Yang , Zexi Kuang , Xue Xia , Yilun Zhao

Software Testing with Large Language Models: An Interview Study with Practitioners

\textit{Background:} The use of large language models in software testing is growing fast as they support numerous tasks, from test case generation to automation, and documentation. However, their adoption often relies on informal…

Software Engineering · Computer Science 2025-10-21 Maria Deolinda Santana , Cleyton Magalhaes , Ronnie de Souza Santos

A Tool for Test Case Scenarios Generation Using Large Language Models

Large Language Models (LLMs) are widely used in Software Engineering (SE) for various tasks, including generating code, designing and documenting software, adding code comments, reviewing code, and writing test scripts. However, creating…

Software Engineering · Computer Science 2024-06-12 Abdul Malik Sami , Zeeshan Rasheed , Muhammad Waseem , Zheying Zhang , Herda Tomas , Pekka Abrahamsson

MdEval: Massively Multilingual Code Debugging

Code large language models (LLMs) have made significant progress in code debugging by directly generating the correct code based on the buggy code snippet. Programming benchmarks, typically consisting of buggy code snippet and their…

Computation and Language · Computer Science 2025-02-25 Shukai Liu , Linzheng Chai , Jian Yang , Jiajun Shi , He Zhu , Liran Wang , Ke Jin , Wei Zhang , Hualei Zhu , Shuyue Guo , Tao Sun , Jiaheng Liu , Yunlong Duan , Yu Hao , Liqun Yang , Guanglin Niu , Ge Zhang , Zhoujun Li

ComplexCodeEval: A Benchmark for Evaluating Large Code Models on More Complex Code

In recent years, the application of large language models (LLMs) to code-related tasks has gained significant attention. However, existing evaluation benchmarks often focus on limited scenarios, such as code generation or completion, which…

Software Engineering · Computer Science 2024-09-17 Jia Feng , Jiachen Liu , Cuiyun Gao , Chun Yong Chong , Chaozheng Wang , Shan Gao , Xin Xia

A Software Engineering Perspective on Testing Large Language Models: Research, Practice, Tools and Benchmarks

Large Language Models (LLMs) are rapidly becoming ubiquitous both as stand-alone tools and as components of current and future software systems. To enable usage of LLMs in the high-stake or safety-critical systems of 2030, they need to…

Software Engineering · Computer Science 2024-06-13 Sinclair Hudson , Sophia Jit , Boyue Caroline Hu , Marsha Chechik

A System Model Generation Benchmark from Natural Language Requirements

System models, a critical artifact in software development, provide a formal abstraction of both the structural and behavioral aspects of software systems, which can facilitate the early requirements analysis and architecture design.…

Software Engineering · Computer Science 2025-08-06 Dongming Jin , Zhi Jin , Linyu Li , Zheng Fang , Jia Li , Xiaohong Chen