Software Engineering · Computer Science
WebCoderBench: Benchmarking Web Application Generation with Comprehensive and Interpretable Evaluation Metrics
Chenxu Liu, Yingjie Fu, Wei Yang, Ying Zhang +1
2026-03-17
Software Engineering · Computer Science
RealBench: A Repo-Level Code Generation Benchmark Aligned with Real-World Software Development Practices
Jia Li, Hongyi Deng, Yiran Zhang, Kechi Zhang +8
2026-04-27
Distributed, Parallel, and Cluster Computing · Computer Science
Performance-Aligned LLMs for Generating Fast Code
Daniel Nichols, Pranav Polasam, Harshitha Menon, Aniruddha Marathe +2
2024-04-30
Software Engineering · Computer Science
Measuring Coding Challenge Competence With APPS
Dan Hendrycks, Steven Basart, Saurav Kadavath, Mantas Mazeika +7
2021-11-10
Software Engineering · Computer Science
Assessing Small Language Models for Code Generation: An Empirical Study with Benchmarks
Md Mahade Hasan, Muhammad Waseem, Kai-Kristian Kemell, Jussi Rasku +2
2026-01-21
Software Engineering · Computer Science
CodeAlignBench: Assessing Code Generation Models on Developer-Preferred Code Adjustments
Forough Mehralian, Ryan Shar, James R. Rae, Alireza Hashemi
2025-11-03
Computation and Language · Computer Science
CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language Models
Lingyue Fu, Huacan Chai, Shuang Luo, Kounianhua Du +12
2024-03-12
Software Engineering · Computer Science
PerfCodeBench: Benchmarking LLMs for System-Level High-Performance Code Optimization
Huihao Jing, Wenbin Hu, Haochen Shi, Hanyu Yang +4
2026-05-18
Software Engineering · Computer Science
Large Language Models for Code Generation: The Practitioners Perspective
Zeeshan Rasheed, Muhammad Waseem, Kai Kristian Kemell, Aakash Ahmad +4
2025-01-29
Software Engineering · Computer Science
CodeAgent: Enhancing Code Generation with Tool-Integrated Agent Systems for Real-World Repo-level Coding Challenges
Kechi Zhang, Jia Li, Ge Li, Xianjie Shi +1
2024-08-12
Computation and Language · Computer Science
A Survey on Large Language Models for Code Generation
Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim +1
2025-10-28
Artificial Intelligence · Computer Science
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks
Kai Xu, YiWei Mao, XinYi Guan, ZiLong Feng
2025-05-13
Software Engineering · Computer Science
What's Wrong with Your Code Generated by Large Language Models? An Extensive Study
Shihan Dou, Haoxiang Jia, Shenxi Wu, Huiyuan Zheng +13
2025-10-20
Software Engineering · Computer Science
CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code Generation
Kaiwen Yan, Hongcheng Guo, Xuanqing Shi, Shaosheng Cao +2
2025-08-05
Software Engineering · Computer Science
FrontendBench: A Benchmark for Evaluating LLMs on Front-End Development via Automatic Evaluation
Hongda Zhu, Yiwen Zhang, Bing Zhao, Jingzhe Ding +5
2025-06-19
Software Engineering · Computer Science
Assessing and Improving the Representativeness of Code Generation Benchmarks Using Knowledge Units (KUs) of Programming Languages -- An Empirical Study
Md Ahasanuzzaman, Bram Adams, Emad Fallahzadeh, Gustavo A. Oliva +1
2026-01-08
High Energy Physics - Experiment · Physics
CelloAI Benchmarks: Toward Repeatable Evaluation of AI Assistants
Mohammad Atif, Kriti Chopra, Fang-Ying Tsai, Ozgur O. Kilic +7
2026-03-03
Software Engineering · Computer Science
Benchmarking Large Language Models for ABAP Code Generation: An Empirical Study on Iterative Improvement by Compiler Feedback
Stephan Wallraven, Tim Köhne, Hartmut Westenberger, Andreas Moser
2026-01-22