Related papers: Scaling Test-Driven Code Generation from Functions…

Test-Driven Development for Code Generation

Recent Large Language Models (LLMs) have demonstrated significant capabilities in generating code snippets directly from problem statements. This increasingly automated process mirrors traditional human-led software development, where code…

Software Engineering · Computer Science 2024-10-23 Noble Saji Mathews , Meiyappan Nagappan

Enhancing Large Language Models for Text-to-Testcase Generation

Context: Test-driven development (TDD) is a widely employed software development practice that involves developing test cases based on requirements prior to writing the code. Although various methods for automated test case generation have…

Software Engineering · Computer Science 2025-04-02 Saranya Alagarsamy , Chakkrit Tantithamthavorn , Wannita Takerngsaksiri , Chetan Arora , Aldeida Aleti

Leveraging Test Driven Development with Large Language Models for Reliable and Verifiable Spreadsheet Code Generation: A Research Framework

Large Language Models (LLMs), such as ChatGPT, are increasingly leveraged for generating both traditional software code and spreadsheet logic. Despite their impressive generative capabilities, these models frequently exhibit critical issues…

Software Engineering · Computer Science 2025-11-27 Simon Thorne , Advait Sarkar

From Runnable to Shippable: Multi-Agent Test-Driven Development for Generating Full-Stack Web Applications from Requirements

Coding agents can generate web applications from natural-language descriptions, yet a recent benchmark study shows that generated applications fail to meet functional requirements in over 70% of cases. The core difficulty is that web…

Software Engineering · Computer Science 2026-05-19 Yuxuan Wan , Tingshuo Liang , Jiakai Xu , Jingyu Xiao , Yintong Huo , Michael R Lyu

ClassEval-T: Evaluating Large Language Models in Class-Level Code Translation

In recent years, Large Language Models (LLMs) have dramatically advanced the performance of automated code translation, making their computational accuracy score reach up to over 80% on many previous benchmarks. However, most code samples…

Software Engineering · Computer Science 2025-04-15 Pengyu Xue , Linhao Wu , Zhen Yang , Chengyi Wang , Xiang Li , Yuxiang Zhang , Jia Li , Ruikai Jin , Yifei Pei , Zhaoyan Shen , Xiran Lyu , Jacky Wai Keung

A Family of Experiments on Test-Driven Development

Context: Test-driven development (TDD) is an agile software development approach that has been widely claimed to improve software quality. However, the extent to which TDD improves quality appears to be largely dependent upon the…

Software Engineering · Computer Science 2020-11-25 Adrian Santos , Sira Vegas , Oscar Dieste , Fernando Uyaguari , Aysee Tosun , Davide Fucci , Burak Turhan , Giuseppe Scanniello , Simone Romano , Itir Karac , Marco Kuhrmann , Vladimir Mandic , Robert Ramac , Dietmar Pfahl , Christian Engblom , Jarno Kyykka , Kerli Rungi , Carolina Palomeque , Jaroslav Spisak , Markku Oivo , Natalia Juristo

TDD Governance for Multi-Agent Code Generation via Prompt Engineering

Large language models (LLMs) accelerate software development but often exhibit instability, non-determinism, and weak adherence to development discipline in unconstrained workflows. While test-driven development (TDD) provides a structured…

Software Engineering · Computer Science 2026-04-30 Tarlan Hasanli , Shahbaz Siddeeq , Bishwash Khanal , Pyry Kotilainen , Tommi Mikkonen , Pekka Abrahamsson

Fixing Function-Level Code Generation Errors for Foundation Large Language Models

Function-level code generation leverages foundation Large Language Models (LLMs) to automatically produce source code with expected functionality. It has been widely investigated and applied in intelligent programming assistants, such as…

Software Engineering · Computer Science 2025-01-22 Hao Wen , Yueheng Zhu , Chao Liu , Xiaoxue Ren , Weiwei Du , Meng Yan

Tests as Prompt: A Test-Driven-Development Benchmark for LLM Code Generation

We introduce WebApp1K, a novel benchmark for evaluating large language models (LLMs) in test-driven development (TDD) tasks, where test cases serve as both prompt and verification for code generation. Unlike traditional approaches relying…

Software Engineering · Computer Science 2025-05-15 Yi Cui

TDD-Bench Verified: Can LLMs Generate Tests for Issues Before They Get Resolved?

Test-driven development (TDD) is the practice of writing tests first and coding later, and the proponents of TDD expound its numerous benefits. For instance, given an issue on a source code repository, tests can clarify the desired behavior…

Software Engineering · Computer Science 2024-12-05 Toufique Ahmed , Martin Hirzel , Rangeet Pan , Avraham Shinnar , Saurabh Sinha

TENET: Leveraging Tests Beyond Validation for Code Generation

Test-Driven Development (TDD) is a widely adopted software engineering practice that requires developers to create and execute tests alongside code implementation, ensuring that software behavior is continuously validated and refined. In…

Software Engineering · Computer Science 2025-10-01 Yiran Hu , Nan Jiang , Shanchao Liang , Yi Wu , Lin Tan

LLM4TDD: Best Practices for Test Driven Development Using Large Language Models

In today's society, we are becoming increasingly dependent on software systems. However, we also constantly witness the negative impacts of buggy software. Program synthesis aims to improve software correctness by automatically generating…

Software Engineering · Computer Science 2023-12-11 Sanyogita Piya , Allison Sullivan

ClassEval: A Manually-Crafted Benchmark for Evaluating LLMs on Class-level Code Generation

In this work, we make the first attempt to evaluate LLMs in a more challenging code generation scenario, i.e. class-level code generation. We first manually construct the first class-level code generation benchmark ClassEval of 100…

Computation and Language · Computer Science 2023-08-15 Xueying Du , Mingwei Liu , Kaixin Wang , Hanlin Wang , Junwei Liu , Yixuan Chen , Jiayi Feng , Chaofeng Sha , Xin Peng , Yiling Lou

Enhancing LLM Code Generation Capabilities through Test-Driven Development and Code Interpreter

Over the past few years, improving LLM code generation capabilities has been a key focus in NLP research. Despite Bengali having 242 million native speakers worldwide, it receives little attention when it comes to training LLMs. More…

Software Engineering · Computer Science 2025-11-18 Sajed Jalil , Shuvo Saha , Hossain Mohammad Seym

A Comparative Case Study on the Impact of Test-Driven Development on Program Design and Test Coverage

Test-driven development (TDD) is a programming technique in which the tests are written prior to the source code. It is proposed that TDD is one of the most fundamental practices enabling the development of software in an agile and…

Software Engineering · Computer Science 2017-11-15 Maria Siniaalto , Pekka Abrahamsson

ClassEval-Pro: A Cross-Domain Benchmark for Class-Level Code Generation

LLMs have achieved strong results on both function-level code synthesis and repository-level code modification, yet a capability that falls between these two extremes -- compositional code creation, i.e., building a complete, internally…

Software Engineering · Computer Science 2026-04-30 Yeheng Chen , Chaoxiang Xie , Yuling Shi , Wenhao Zeng , Yongpan Wang , Hongyu Zhang , Xiaodong Gu

Enhancing LLM-Based Test Generation by Eliminating Covered Code

Automated test generation is essential for software quality assurance, with coverage rate serving as a key metric to ensure thorough testing. Recent advancements in Large Language Models (LLMs) have shown promise in improving test…

Software Engineering · Computer Science 2026-02-26 WeiZhe Xu , Mengyu Liu , Fanxin Kong

IFEvalCode: Controlled Code Generation

Code large language models (Code LLMs) have made significant progress in code generation by translating natural language descriptions into functional code; however, real-world applications often demand stricter adherence to detailed…

Computation and Language · Computer Science 2025-08-04 Jian Yang , Wei Zhang , Shukai Liu , Linzheng Chai , Yingshui Tan , Jiaheng Liu , Ge Zhang , Wangchunshu Zhou , Guanglin Niu , Zhoujun Li , Binyuan Hui , Junyang Lin

Designing Empirical Studies on LLM-Based Code Generation: Towards a Reference Framework

The rise of large language models (LLMs) has introduced transformative potential in automated code generation, addressing a wide range of software engineering challenges. However, empirical evaluation of LLM-based code generation lacks…

Software Engineering · Computer Science 2025-10-07 Nathalia Nascimento , Everton Guimaraes , Paulo Alencar

Self-Evaluation Improves Selective Generation in Large Language Models

Safe deployment of large language models (LLMs) may benefit from a reliable method for assessing their generated content to determine when to abstain or to selectively generate. While likelihood-based metrics such as perplexity are widely…

Computation and Language · Computer Science 2023-12-18 Jie Ren , Yao Zhao , Tu Vu , Peter J. Liu , Balaji Lakshminarayanan