Software Engineering · Computer Science
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation
Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang +18
2021-03-17
Software Engineering · Computer Science
CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks
Yiqing Xie, Alex Xie, Divyanshu Sheth, Pengfei Liu +2
2024-10-04
Software Engineering · Computer Science
Vibe Code Bench: Evaluating AI Models on End-to-End Web Application Development
Hung Tran, Langston Nashold, Rayan Krishnan, Antoine Bigeard +1
2026-05-15
Software Engineering · Computer Science
CodeAlignBench: Assessing Code Generation Models on Developer-Preferred Code Adjustments
Forough Mehralian, Ryan Shar, James R. Rae, Alireza Hashemi
2025-11-03
Software Engineering · Computer Science
CodeGen-Test: An Automatic Code Generation Model Integrating Program Test Information
Maosheng Zhong, Gen Liu, Hongwei Li, Jiangling Kuang +2
2022-02-16
Software Engineering · Computer Science
FrontendBench: A Benchmark for Evaluating LLMs on Front-End Development via Automatic Evaluation
Hongda Zhu, Yiwen Zhang, Bing Zhao, Jingzhe Ding +5
2025-06-19
Machine Learning · Computer Science
DevBench: A Realistic, Developer-Informed Benchmark for Code Generation Models
Adarsh Kumarappan, Pareesa Ameneh Golnari, Wen Wen, Xiaoyu Liu +4
2026-05-19
Machine Learning · Computer Science
Multi-lingual Evaluation of Code Generation Models
Ben Athiwaratkun, Sanjay Krishna Gouda, Zijian Wang, Xiaopeng Li +21
2023-03-30
Software Engineering · Computer Science
Assessing the Promise and Pitfalls of ChatGPT for Automated Code Generation
Muhammad Fawad Akbar Khan, Max Ramsdell, Erik Falor, Hamid Karimi
2023-11-07
Computation and Language · Computer Science
CodeT: Code Generation with Generated Tests
Bei Chen, Fengji Zhang, Anh Nguyen, Daoguang Zan +3
2022-11-24
Computation and Language · Computer Science
AutoCodeBench: Large Language Models are Automatic Code Benchmark Generators
Jason Chou, Ao Liu, Yuchi Deng, Zhiying Zeng +12
2025-08-13
Software Engineering · Computer Science
Measuring Coding Challenge Competence With APPS
Dan Hendrycks, Steven Basart, Saurav Kadavath, Mantas Mazeika +7
2021-11-10
Computation and Language · Computer Science
BenchBench: Benchmarking Automated Benchmark Generation
Yandan Zheng, Haoran Luo, Zhenghong Lin, Wenjin Liu +1
2026-03-24
Computation and Language · Computer Science
IFEvalCode: Controlled Code Generation
Jian Yang, Wei Zhang, Shukai Liu, Linzheng Chai +8
2025-08-04
Software Engineering · Computer Science
CoderEval: A Benchmark of Pragmatic Code Generation with Generative Pre-trained Models
Hao Yu, Bo Shen, Dezhi Ran, Jiaxin Zhang +6
2024-02-26
Computation and Language · Computer Science
ClassEval: A Manually-Crafted Benchmark for Evaluating LLMs on Class-level Code Generation
Xueying Du, Mingwei Liu, Kaixin Wang, Hanlin Wang +6
2023-08-15
Machine Learning · Computer Science
AICD Bench: A Challenging Benchmark for AI-Generated Code Detection
Daniil Orel, Dilshod Azizov, Indraneil Paul, Yuxia Wang +2
2026-02-03
Artificial Intelligence · Computer Science
The Procedural Content Generation Benchmark: An Open-source Testbed for Generative Challenges in Games
Ahmed Khalifa, Roberto Gallotta, Matthew Barthet, Antonios Liapis +2
2025-03-31
Software Engineering · Computer Science
DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation
Yuhang Lai, Chengxi Li, Yiming Wang, Tianyi Zhang +6
2022-11-22
Software Engineering · Computer Science
MultiAIGCD: A Comprehensive dataset for AI Generated Code Detection Covering Multiple Languages, Models,Prompts, and Scenarios
Basak Demirok, Mucahid Kutlu, Selin Mergen
2025-07-30