English
Related papers

Related papers: WebApp1K: A Practical Code-Generation Benchmark fo…

200 papers

This paper presents insights from evaluating 16 frontier large language models (LLMs) on the WebApp1K benchmark, a test suite designed to assess the ability of LLMs to generate web application code. The results reveal that while all models…

Software Engineering · Computer Science 2024-09-10 Yi Cui

We introduce WebApp1K, a novel benchmark for evaluating large language models (LLMs) in test-driven development (TDD) tasks, where test cases serve as both prompt and verification for code generation. Unlike traditional approaches relying…

Software Engineering · Computer Science 2025-05-15 Yi Cui

Web applications (web apps) have become a key arena for large language models (LLMs) to demonstrate their code generation capabilities and commercial potential. However, building a benchmark for LLM-generated web apps remains challenging…

Software Engineering · Computer Science 2026-03-17 Chenxu Liu , Yingjie Fu , Wei Yang , Ying Zhang , Tao Xie

Large language models (LLMs) have demonstrated strong performance on function-level code generation benchmarks, yet real-world software development increasingly demands class-level implementations that integrate multiple methods,…

Software Engineering · Computer Science 2025-11-06 Musfiqur Rahman , SayedHassan Khatoonabadi , Emad Shihab

Writing code requires significant time and effort in software development. To automate this process, researchers have made substantial progress using Large Language Models (LLMs) for code generation. Many benchmarks like HumanEval and…

Software Engineering · Computer Science 2026-04-27 Jia Li , Hongyi Deng , Yiran Zhang , Kechi Zhang , Tianqi Shao , Tiankuo Zhao , Weinan Wang , Zhi Jin , Ge Li , Yang Liu , Yingtao Fang , Yihong Dong

Optimizing scientific software is a difficult task because codebases are often large and complex, and performance can depend upon several factors including the algorithm, its implementation, and hardware among others. Causes of poor…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-30 Daniel Nichols , Pranav Polasam , Harshitha Menon , Aniruddha Marathe , Todd Gamblin , Abhinav Bhatele

While programming is one of the most broadly applicable skills in modern society, modern machine learning models still cannot code solutions to basic problems. Despite its importance, there has been surprisingly little work on evaluating…

The recent advancements of Small Language Models (SLMs) have opened new possibilities for efficient code generation. SLMs offer lightweight and cost-effective alternatives to Large Language Models (LLMs), making them attractive for use in…

Software Engineering · Computer Science 2026-01-21 Md Mahade Hasan , Muhammad Waseem , Kai-Kristian Kemell , Jussi Rasku , Juha Ala-Rantala , Pekka Abrahamsson

As large language models become increasingly capable of generating code, evaluating their performance remains a complex and evolving challenge. Existing benchmarks primarily focus on functional correctness, overlooking the diversity of…

Software Engineering · Computer Science 2025-11-03 Forough Mehralian , Ryan Shar , James R. Rae , Alireza Hashemi

This paper presents a case study of coding tasks by the latest reasoning models of OpenAI, i.e. o1-preview and o1-mini, in comparison with other frontier models. The o1 models deliver SOTA results for WebApp1K, a single-task benchmark. To…

Software Engineering · Computer Science 2024-09-24 Yi Cui

With the emergence of Large Language Models (LLMs), there has been a significant improvement in the programming capabilities of models, attracting growing attention from researchers. Evaluating the programming capabilities of LLMs is…

Large language models (LLMs) can often generate functionally correct code, but their ability to produce efficient implementations for performance-critical systems tasks remains limited. Existing code benchmarks mainly emphasize correctness…

Software Engineering · Computer Science 2026-05-18 Huihao Jing , Wenbin Hu , Haochen Shi , Hanyu Yang , Sirui Zhang , Shaojin Chen , Haoran Li , Yangqiu Song

Large Language Models (LLMs) have emerged as coding assistants, capable of generating source code from natural language prompts. With the increasing adoption of LLMs in software development, academic research and industry based projects are…

Large Language Models (LLMs) have shown promise in automated code generation but typically excel only in simpler tasks such as generating standalone code units. Real-world software development, however, often involves complex code…

Software Engineering · Computer Science 2024-08-12 Kechi Zhang , Jia Li , Ge Li , Xianjie Shi , Zhi Jin

Large Language Models (LLMs) have garnered remarkable advancements across diverse code-related tasks, known as Code LLMs, particularly in code generation that generates source code with LLM from natural language descriptions. This…

Computation and Language · Computer Science 2025-10-28 Juyong Jiang , Fan Wang , Jiasi Shen , Sungju Kim , Sunghun Kim

The application of large language models (LLMs) in the field of coding is evolving rapidly: from code assistants, to autonomous coding agents, and then to generating complete projects through natural language. Early LLM code benchmarks…

Artificial Intelligence · Computer Science 2025-05-13 Kai Xu , YiWei Mao , XinYi Guan , ZiLong Feng

Large Language Models (LLMs) have demonstrated their remarkable capabilities in numerous fields. This survey focuses on how LLMs empower users, regardless of their technical background, to use human languages to automatically generate…

Software Engineering · Computer Science 2025-04-03 Nam Huynh , Beiyu Lin

The increasing development of LLMs in code generation has drawn significant attention among researchers. To enhance LLM-based code generation ability, current efforts are predominantly directed towards collecting high-quality datasets and…

With the rapid advancement of Large Language Models (LLMs), the demand for robust instruction-following capabilities in code generation tasks has grown significantly. Code generation not only facilitates faster prototyping and automated…

Software Engineering · Computer Science 2025-08-05 Kaiwen Yan , Hongcheng Guo , Xuanqing Shi , Shaosheng Cao , Donglin Di , Zhoujun Li

This study presents a comprehensive empirical evaluation of six state-of-the-art large language models (LLMs) for code generation, including both general-purpose and code-specialized models. Using a dataset of 944 real-world LeetCode…

Software Engineering · Computer Science 2025-12-23 Le Zhang , Suresh Kothari
‹ Prev 1 2 3 10 Next ›