Related papers: CFCEval: Evaluating Security Aspects in Code Gener…

CWEval: Outcome-driven Evaluation on Functionality and Security of LLM Code Generation

Large Language Models (LLMs) have significantly aided developers by generating or assisting in code writing, enhancing productivity across various tasks. While identifying incorrect code is often straightforward, detecting vulnerabilities…

Software Engineering · Computer Science 2025-01-15 Jinjun Peng , Leyi Cui , Kele Huang , Junfeng Yang , Baishakhi Ray

Is Your AI-Generated Code Really Safe? Evaluating Large Language Models on Secure Code Generation with CodeSecEval

Large language models (LLMs) have brought significant advancements to code generation and code repair, benefiting both novice and experienced developers. However, their training using unsanitized data from open-source repositories, like…

Software Engineering · Computer Science 2024-07-08 Jiexin Wang , Xitong Luo , Liuwen Cao , Hongkui He , Hailin Huang , Jiayuan Xie , Adam Jatowt , Yi Cai

SafeGenBench: A Benchmark Framework for Security Vulnerability Detection in LLM-Generated Code

The code generation capabilities of large language models(LLMs) have emerged as a critical dimension in evaluating their overall performance. However, prior research has largely overlooked the security risks inherent in the generated code.…

Cryptography and Security · Computer Science 2025-06-23 Xinghang Li , Jingzhe Ding , Chao Peng , Bing Zhao , Xiang Gao , Hongwan Gao , Xinchen Gu

LLM-CSEC: Empirical Evaluation of Security in C/C++ Code Generated by Large Language Models

The security of code generated by large language models (LLMs) is a significant concern, as studies indicate that such code often contains vulnerabilities and lacks essential defensive programming constructs. This work focuses on examining…

Artificial Intelligence · Computer Science 2025-11-25 Muhammad Usman Shahid , Chuadhry Mujeeb Ahmed , Rajiv Ranjan

CodeScore: Evaluating Code Generation by Learning Code Execution

A proper code evaluation metric (CEM) profoundly impacts the evolution of code generation, which is an important research field in NLP and software engineering. Prevailing match-based CEMs (e.g., BLEU, Accuracy, and CodeBLEU) suffer from…

Software Engineering · Computer Science 2024-09-06 Yihong Dong , Jiazheng Ding , Xue Jiang , Ge Li , Zhuo Li , Zhi Jin

L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models

Recently, large language models (LLMs), especially those that are pretrained on code, have demonstrated strong capabilities in generating programs from natural language inputs in a few-shot or even zero-shot manner. Despite promising…

Computation and Language · Computer Science 2023-10-03 Ansong Ni , Pengcheng Yin , Yilun Zhao , Martin Riddell , Troy Feng , Rui Shen , Stephen Yin , Ye Liu , Semih Yavuz , Caiming Xiong , Shafiq Joty , Yingbo Zhou , Dragomir Radev , Arman Cohan

LLMSecEval: A Dataset of Natural Language Prompts for Security Evaluations

Large Language Models (LLMs) like Codex are powerful tools for performing code completion and code generation tasks as they are trained on billions of lines of code from publicly available sources. Moreover, these models are capable of…

Software Engineering · Computer Science 2023-03-17 Catherine Tony , Markus Mutas , Nicolás E. Díaz Ferreyra , Riccardo Scandariato

CodeJudge-Eval: Can Large Language Models be Good Judges in Code Understanding?

Recent advancements in large language models (LLMs) have showcased impressive code generation capabilities, primarily evaluated through language-to-code benchmarks. However, these benchmarks may not fully capture a model's code…

Software Engineering · Computer Science 2024-09-16 Yuwei Zhao , Ziyang Luo , Yuchen Tian , Hongzhan Lin , Weixiang Yan , Annan Li , Jing Ma

Rethinking the Evaluation of Secure Code Generation

Large language models (LLMs) are widely used in software development. However, the code generated by LLMs often contains vulnerabilities. Several secure code generation methods have been proposed to address this issue, but their current…

Cryptography and Security · Computer Science 2025-11-14 Shih-Chieh Dai , Jun Xu , Guanhong Tao

FairCoder: Evaluating Social Bias of LLMs in Code Generation

Large language models (LLMs) have been widely deployed in coding tasks, drawing increasing attention to the evaluation of the quality and safety of LLMs' outputs. However, research on bias in code generation remains limited. Existing…

Computation and Language · Computer Science 2025-04-03 Yongkang Du , Jen-tse Huang , Jieyu Zhao , Lu Lin

Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models

This paper presents CyberSecEval, a comprehensive benchmark developed to help bolster the cybersecurity of Large Language Models (LLMs) employed as coding assistants. As what we believe to be the most extensive unified cybersecurity safety…

Cryptography and Security · Computer Science 2023-12-11 Manish Bhatt , Sahana Chennabasappa , Cyrus Nikolaidis , Shengye Wan , Ivan Evtimov , Dominik Gabi , Daniel Song , Faizan Ahmad , Cornelius Aschermann , Lorenzo Fontana , Sasha Frolov , Ravi Prakash Giri , Dhaval Kapil , Yiannis Kozyrakis , David LeBlanc , James Milazzo , Aleksandar Straumann , Gabriel Synnaeve , Varun Vontimitta , Spencer Whitman , Joshua Saxe

RealSec-bench: A Benchmark for Evaluating Secure Code Generation in Real-World Repositories

Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation, but their proficiency in producing secure code remains a critical, under-explored area. Existing benchmarks often fall short by relying on synthetic…

Cryptography and Security · Computer Science 2026-02-02 Yanlin Wang , Ziyao Zhang , Chong Wang , Xinyi Xu , Mingwei Liu , Yong Wang , Jiachi Chen , Zibin Zheng

CodeEval: A pedagogical approach for targeted evaluation of code-trained Large Language Models

Large Language Models (LLMs) are predominantly assessed based on their common sense reasoning, language comprehension, and logical reasoning abilities. While models trained in specialized domains like mathematics or coding have demonstrated…

Software Engineering · Computer Science 2026-01-08 Danny Brahman , Mohammad Mahoor

IFEvalCode: Controlled Code Generation

Code large language models (Code LLMs) have made significant progress in code generation by translating natural language descriptions into functional code; however, real-world applications often demand stricter adherence to detailed…

Computation and Language · Computer Science 2025-08-04 Jian Yang , Wei Zhang , Shukai Liu , Linzheng Chai , Yingshui Tan , Jiaheng Liu , Ge Zhang , Wangchunshu Zhou , Guanglin Niu , Zhoujun Li , Binyuan Hui , Junyang Lin

Can You Really Trust Code Copilots? Evaluating Large Language Models from a Code Security Perspective

Code security and usability are both essential for various coding assistant applications driven by large language models (LLMs). Current code security benchmarks focus solely on single evaluation task and paradigm, such as code completion…

Computation and Language · Computer Science 2025-05-16 Yutao Mou , Xiao Deng , Yuxiao Luo , Shikun Zhang , Wei Ye

Exploring Multi-Lingual Bias of Large Code Models in Code Generation

Code generation aims to synthesize code and fulfill functional requirements based on natural language (NL) specifications, which can greatly improve development efficiency. In the era of large language models (LLMs), large code models…

Software Engineering · Computer Science 2024-05-01 Chaozheng Wang , Zongjie Li , Cuiyun Gao , Wenxuan Wang , Ting Peng , Hailiang Huang , Yuetang Deng , Shuai Wang , Michael R. Lyu

ICE-Score: Instructing Large Language Models to Evaluate Code

Recent advancements in the field of natural language generation have facilitated the use of large language models to assess the quality of generated text. Although these models have shown promising results in tasks such as machine…

Artificial Intelligence · Computer Science 2024-01-23 Terry Yue Zhuo

SALLM: Security Assessment of Generated Code

With the growing popularity of Large Language Models (LLMs) in software engineers' daily practices, it is important to ensure that the code generated by these tools is not only functionally correct but also free of vulnerabilities. Although…

Software Engineering · Computer Science 2024-09-06 Mohammed Latif Siddiq , Joanna C. S. Santos , Sajith Devareddy , Anna Muller

Vulnerability Detection with Code Language Models: How Far Are We?

In the context of the rising interest in code language models (code LMs) and vulnerability detection, we study the effectiveness of code LMs for detecting vulnerabilities. Our analysis reveals significant shortcomings in existing…

Software Engineering · Computer Science 2024-07-11 Yangruibo Ding , Yanjun Fu , Omniyyah Ibrahim , Chawin Sitawarin , Xinyun Chen , Basel Alomair , David Wagner , Baishakhi Ray , Yizheng Chen

Enhancing Large Language Models for Secure Code Generation: A Dataset-driven Study on Vulnerability Mitigation

Large language models (LLMs) have brought significant advancements to code generation, benefiting both novice and experienced developers. However, their training using unsanitized data from open-source repositories, like GitHub, introduces…

Software Engineering · Computer Science 2023-10-26 Jiexin Wang , Liuwen Cao , Xitong Luo , Zhiping Zhou , Jiayuan Xie , Adam Jatowt , Yi Cai