Computation and Language · Computer Science
ExecRepoBench: Multi-level Executable Code Completion Evaluation
Jian Yang, Jiajun Zhang, Jiaxi Yang, Ke Jin +8
2024-12-17
Software Engineering · Computer Science
RealBench: A Repo-Level Code Generation Benchmark Aligned with Real-World Software Development Practices
Jia Li, Hongyi Deng, Yiran Zhang, Kechi Zhang +8
2026-04-27
Software Engineering · Computer Science
Class-Level Code Generation from Natural Language Using Iterative, Tool-Enhanced Reasoning over Repository
Ajinkya Deshpande, Anmol Agarwal, Shashank Shet, Arun Iyer +3
2024-06-06
Software Engineering · Computer Science
PerfCodeBench: Benchmarking LLMs for System-Level High-Performance Code Optimization
Huihao Jing, Wenbin Hu, Haochen Shi, Hanyu Yang +4
2026-05-18
Software Engineering · Computer Science
CoreCodeBench: Decoupling Code Intelligence via Fine-Grained Repository-Level Tasks
Lingyue Fu, Hao Guan, Bolun Zhang, Haowei Yuan +9
2026-01-08
Computation and Language · Computer Science
RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation
Fengji Zhang, Bei Chen, Yue Zhang, Jacky Keung +5
2023-10-23
Computation and Language · Computer Science
AutoCodeBench: Large Language Models are Automatic Code Benchmark Generators
Jason Chou, Ao Liu, Yuchi Deng, Zhiying Zeng +12
2025-08-13
Software Engineering · Computer Science
RepoTransBench: A Real-World Multilingual Benchmark for Repository-Level Code Translation
Yanli Wang, Yanlin Wang, Suiquan Wang, Daya Guo +7
2025-12-17
Software Engineering · Computer Science
Can Language Models Replace Programmers for Coding? REPOCOD Says 'Not Yet'
Shanchao Liang, Yiran Hu, Nan Jiang, Lin Tan
2025-06-26
Software Engineering · Computer Science
RepoDebug: Repository-Level Multi-Task and Multi-Language Debugging Evaluation of Large Language Models
Jingjing Liu, Zeming Liu, Zihao Cheng, Mengliang He +6
2025-09-09
Software Engineering · Computer Science
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Terry Yue Zhuo, Minh Chien Vu, Jenny Chim, Han Hu +29
2025-04-02
Software Engineering · Computer Science
CoCo-Bench: A Comprehensive Code Benchmark For Multi-task Large Language Model Evaluation
Wenjing Yin, Tianze Sun, Yijiong Yu, Jiawei Fang +18
2025-04-30
Software Engineering · Computer Science
FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation
Wei Li, Xin Zhang, Zhongxin Guo, Shaoguang Mao +5
2025-06-23
Software Engineering · Computer Science
RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph
Siru Ouyang, Wenhao Yu, Kaixin Ma, Zilin Xiao +5
2025-03-20
Artificial Intelligence · Computer Science
PromptBench: A Unified Library for Evaluation of Large Language Models
Kaijie Zhu, Qinlin Zhao, Hao Chen, Jindong Wang +1
2024-08-21
Software Engineering · Computer Science
FrontendBench: A Benchmark for Evaluating LLMs on Front-End Development via Automatic Evaluation
Hongda Zhu, Yiwen Zhang, Bing Zhao, Jingzhe Ding +5
2025-06-19
Software Engineering · Computer Science
CodeAlignBench: Assessing Code Generation Models on Developer-Preferred Code Adjustments
Forough Mehralian, Ryan Shar, James R. Rae, Alireza Hashemi
2025-11-03
Software Engineering · Computer Science
RepoHyper: Search-Expand-Refine on Semantic Graphs for Repository-Level Code Completion
Huy N. Phan, Hoang N. Phan, Tien N. Nguyen, Nghi D. Q. Bui
2024-08-15
Software Engineering · Computer Science
RepoMasterEval: Evaluating Code Completion via Real-World Repositories
Qinyun Wu, Chao Peng, Pengfei Gao, Ruida Hu +8
2025-11-03
Software Engineering · Computer Science
RepoLaunch: Automating Build&Test Pipeline of Code Repositories on ANY Language and ANY Platform
Kenan Li, Rongzhi Li, Linghao Zhang, Qirui Jin +16
2026-03-06
Software Engineering · Computer Science
LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering
Jielin Qiu, Zuxin Liu, Zhiwei Liu, Rithesh Murthy +13
2025-09-12
Cryptography and Security · Computer Science
REBENCH: A Procedural, Fair-by-Construction Benchmark for LLMs on Stripped-Binary Types and Names (Extended Version)
Jun Yeon Won, Xin Jin, Shiqing Ma, Zhiqiang Lin
2026-05-01
Software Engineering · Computer Science
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
Naman Jain, King Han, Alex Gu, Wen-Ding Li +6
2024-06-07
Computation and Language · Computer Science
BenchBench: Benchmarking Automated Benchmark Generation
Yandan Zheng, Haoran Luo, Zhenghong Lin, Wenjin Liu +1
2026-03-24
Programming Languages · Computer Science
MHRC-Bench: A Multilingual Hardware Repository-Level Code Completion benchmark
Qingyun Zou, Jiahao Cui, Nuo Chen, Bingsheng He +1
2026-02-03