English
Related papers

Related papers: Evaluating and Achieving Controllable Code Complet…

200 papers

Large Language Models (LLMs) are increasingly applied to real-world code generation, where functional correctness alone is insufficient for reliable deployment, developers also expect adherence to explicit requirements for robustness,…

Software Engineering · Computer Science 2025-12-22 Sravani Gunnu , Shanmukha Guttula , Hima Patel

Code completion has become an essential tool for daily software development. Existing evaluation benchmarks often employ static methods that do not fully capture the dynamic nature of real-world coding environments and face significant…

Computation and Language · Computer Science 2024-12-17 Jian Yang , Jiajun Zhang , Jiaxi Yang , Ke Jin , Lei Zhang , Qiyao Peng , Ken Deng , Yibo Miao , Tianyu Liu , Zeyu Cui , Binyuan Hui , Junyang Lin

Recent frontier-level LLMs have saturated many previously difficult benchmarks, leaving little room for further differentiation. This progress highlights the need for challenging benchmarks that provide objective verification. In this…

Computation and Language · Computer Science 2025-10-10 Hyeonseok Moon , Seongtae Hong , Jaehyung Seo , Heuiseok Lim

Large language models (LLMs) play a crucial role in software engineering, excelling in tasks like code generation and maintenance. However, existing benchmarks are often narrow in scope, focusing on a specific task and lack a comprehensive…

Enhancing the ability of large language models (LLMs) to follow complex instructions is critical for their deployment in real-world applications. However, existing evaluation methods often oversimplify instruction complexity as a mere…

Computation and Language · Computer Science 2026-03-10 Xiaona Xue , Yiqiao Huang , Jiacheng Li , Yuanhang Zheng , Huiqi Miao , Yunfei Ma , Rui Liu , Xinbao Sun , Minglu Liu , Fanyu Meng , Chao Deng , Junlan Feng

Code completion, a highly valuable topic in the software development domain, has been increasingly promoted for use by recent advances in large language models (LLMs). To date, visible LLM-based code completion frameworks such as GitHub…

Software Engineering · Computer Science 2023-05-09 Zongjie Li , Chaozheng Wang , Zhibo Liu , Haoxuan Wang , Dong Chen , Shuai Wang , Cuiyun Gao

As large language models become increasingly capable of generating code, evaluating their performance remains a complex and evolving challenge. Existing benchmarks primarily focus on functional correctness, overlooking the diversity of…

Software Engineering · Computer Science 2025-11-03 Forough Mehralian , Ryan Shar , James R. Rae , Alireza Hashemi

Task automation has been greatly empowered by the recent advances in Large Language Models (LLMs) via Python code, where the tasks ranging from software engineering development to general-purpose reasoning. While current benchmarks have…

Large Language Models for code (code LLMs) have witnessed tremendous progress in recent years. With the rapid development of code LLMs, many popular evaluation benchmarks, such as HumanEval, DS-1000, and MBPP, have emerged to measure the…

Software Engineering · Computer Science 2024-11-15 Linyi Li , Shijie Geng , Zhenwen Li , Yibo He , Hao Yu , Ziyue Hua , Guanghan Ning , Siwei Wang , Tao Xie , Hongxia Yang

Large Language Models (LLMs) applied to code-related applications have emerged as a prominent field, attracting significant interest from both academia and industry. However, as new and improved LLMs are developed, existing evaluation…

Software Engineering · Computer Science 2024-06-07 Naman Jain , King Han , Alex Gu , Wen-Ding Li , Fanjia Yan , Tianjun Zhang , Sida Wang , Armando Solar-Lezama , Koushik Sen , Ion Stoica

With the rapid advancement of Large Language Models (LLMs), the demand for robust instruction-following capabilities in code generation tasks has grown significantly. Code generation not only facilitates faster prototyping and automated…

Software Engineering · Computer Science 2025-08-05 Kaiwen Yan , Hongcheng Guo , Xuanqing Shi , Shaosheng Cao , Donglin Di , Zhoujun Li

Code large language models (Code LLMs) have made significant progress in code generation by translating natural language descriptions into functional code; however, real-world applications often demand stricter adherence to detailed…

Computation and Language · Computer Science 2025-08-04 Jian Yang , Wei Zhang , Shukai Liu , Linzheng Chai , Yingshui Tan , Jiaheng Liu , Ge Zhang , Wangchunshu Zhou , Guanglin Niu , Zhoujun Li , Binyuan Hui , Junyang Lin

Evaluating the performance of Code Language Models (CLMs) for software engineering tasks, especially in multilingual and low-resource programming language settings, poses significant challenges. These challenges are primarily due to the…

Software Engineering · Computer Science 2024-11-26 Rohit Dandamudi , Gema Rodríguez-Pérez

Large language models (LLMs) have advanced significantly in code generation, yet their ability to follow complex programming instructions with layered and diverse constraints remains underexplored. Existing benchmarks often prioritize…

Software Engineering · Computer Science 2025-07-02 Guoliang Duan , Mingwei Liu , Yanlin Wang , Chong Wang , Xin Peng , Zibin Zheng

Code security and usability are both essential for various coding assistant applications driven by large language models (LLMs). Current code security benchmarks focus solely on single evaluation task and paradigm, such as code completion…

Computation and Language · Computer Science 2025-05-16 Yutao Mou , Xiao Deng , Yuxiao Luo , Shikun Zhang , Wei Ye

Large language models (LLMs) can often generate functionally correct code, but their ability to produce efficient implementations for performance-critical systems tasks remains limited. Existing code benchmarks mainly emphasize correctness…

Software Engineering · Computer Science 2026-05-18 Huihao Jing , Wenbin Hu , Haochen Shi , Hanyu Yang , Sirui Zhang , Shaojin Chen , Haoran Li , Yangqiu Song

Large language models (LLMs) have achieved strong performance on code completion tasks in general-purpose programming languages. However, existing repository-level code completion benchmarks focus almost exclusively on software code and…

Programming Languages · Computer Science 2026-02-03 Qingyun Zou , Jiahao Cui , Nuo Chen , Bingsheng He , Weng-Fai Wong

Code coverage is a widely used metric for quantifying the extent to which program elements, such as statements or branches, are executed during testing. Calculating code coverage is resource-intensive, requiring code building and execution…

Software Engineering · Computer Science 2023-07-26 Michele Tufano , Shubham Chandel , Anisha Agarwal , Neel Sundaresan , Colin Clement

While large language models (LLMs) have exhibited impressive instruction-following capabilities, it is still unclear whether and to what extent they can respond to explicit constraints that might be entailed in various instructions. As a…

Computation and Language · Computer Science 2024-01-02 Yihan Chen , Benfeng Xu , Quan Wang , Yi Liu , Zhendong Mao

Code-related benchmarks play a critical role in evaluating large language models (LLMs), yet their quality fundamentally shapes how the community interprets model capabilities. In the past few years, awareness of benchmark quality has…

‹ Prev 1 2 3 10 Next ›