Related papers: Is Your AI-Generated Code Really Safe? Evaluating …

Enhancing Large Language Models for Secure Code Generation: A Dataset-driven Study on Vulnerability Mitigation

Large language models (LLMs) have brought significant advancements to code generation, benefiting both novice and experienced developers. However, their training using unsanitized data from open-source repositories, like GitHub, introduces…

Software Engineering · Computer Science 2023-10-26 Jiexin Wang , Liuwen Cao , Xitong Luo , Zhiping Zhou , Jiayuan Xie , Adam Jatowt , Yi Cai

Security and Quality in LLM-Generated Code: A Multi-Language, Multi-Model Analysis

Artificial Intelligence (AI)-driven code generation tools are increasingly used throughout the software development lifecycle to accelerate coding tasks. However, the security of AI-generated code using Large Language Models (LLMs) remains…

Cryptography and Security · Computer Science 2026-03-10 Mohammed Kharma , Soohyeon Choi , Mohammed AlKhanafseh , David Mohaisen

LLM-CSEC: Empirical Evaluation of Security in C/C++ Code Generated by Large Language Models

The security of code generated by large language models (LLMs) is a significant concern, as studies indicate that such code often contains vulnerabilities and lacks essential defensive programming constructs. This work focuses on examining…

Artificial Intelligence · Computer Science 2025-11-25 Muhammad Usman Shahid , Chuadhry Mujeeb Ahmed , Rajiv Ranjan

Guiding AI to Fix Its Own Flaws: An Empirical Study on LLM-Driven Secure Code Generation

Large Language Models (LLMs) have become powerful tools for automated code generation. However, these models often overlook critical security practices, which can result in the generation of insecure code that contains…

Software Engineering · Computer Science 2025-07-01 Hao Yan , Swapneel Suhas Vaidya , Xiaokuan Zhang , Ziyu Yao

Can We Trust Large Language Models Generated Code? A Framework for In-Context Learning, Security Patterns, and Code Evaluations Across Diverse LLMs

Large Language Models (LLMs) such as ChatGPT and GitHub Copilot have revolutionized automated code generation in software engineering. However, as these models are increasingly utilized for software development, concerns have arisen…

Cryptography and Security · Computer Science 2024-12-03 Ahmad Mohsin , Helge Janicke , Adrian Wood , Iqbal H. Sarker , Leandros Maglaras , Naeem Janjua

LLMSecEval: A Dataset of Natural Language Prompts for Security Evaluations

Large Language Models (LLMs) like Codex are powerful tools for performing code completion and code generation tasks as they are trained on billions of lines of code from publicly available sources. Moreover, these models are capable of…

Software Engineering · Computer Science 2023-03-17 Catherine Tony , Markus Mutas , Nicolás E. Díaz Ferreyra , Riccardo Scandariato

SafeGenBench: A Benchmark Framework for Security Vulnerability Detection in LLM-Generated Code

The code generation capabilities of large language models(LLMs) have emerged as a critical dimension in evaluating their overall performance. However, prior research has largely overlooked the security risks inherent in the generated code.…

Cryptography and Security · Computer Science 2025-06-23 Xinghang Li , Jingzhe Ding , Chao Peng , Bing Zhao , Xiang Gao , Hongwan Gao , Xinchen Gu

Can You Really Trust Code Copilots? Evaluating Large Language Models from a Code Security Perspective

Code security and usability are both essential for various coding assistant applications driven by large language models (LLMs). Current code security benchmarks focus solely on single evaluation task and paradigm, such as code completion…

Computation and Language · Computer Science 2025-05-16 Yutao Mou , Xiao Deng , Yuxiao Luo , Shikun Zhang , Wei Ye

The Hidden Risks of LLM-Generated Web Application Code: A Security-Centric Evaluation of Code Generation Capabilities in Large Language Models

The rapid advancement of Large Language Models (LLMs) has enhanced software development processes, minimizing the time and effort required for coding and enhancing developer productivity. However, despite their potential benefits, code…

Cryptography and Security · Computer Science 2025-04-30 Swaroop Dora , Deven Lunkad , Naziya Aslam , S. Venkatesan , Sandeep Kumar Shukla

Security of LLM-generated Code: A Comparative Analysis

The majority of software developers use or are planning to use Artificial Intelligence (AI) tools in their development processes. Their top reasons include improving productivity and faster learning. In fact, Large Language Model…

Software Engineering · Computer Science 2026-05-25 Srivathsan G Morkonda , Mahmoud Selim , Hala Assal

Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models

This paper presents CyberSecEval, a comprehensive benchmark developed to help bolster the cybersecurity of Large Language Models (LLMs) employed as coding assistants. As what we believe to be the most extensive unified cybersecurity safety…

Cryptography and Security · Computer Science 2023-12-11 Manish Bhatt , Sahana Chennabasappa , Cyrus Nikolaidis , Shengye Wan , Ivan Evtimov , Dominik Gabi , Daniel Song , Faizan Ahmad , Cornelius Aschermann , Lorenzo Fontana , Sasha Frolov , Ravi Prakash Giri , Dhaval Kapil , Yiannis Kozyrakis , David LeBlanc , James Milazzo , Aleksandar Straumann , Gabriel Synnaeve , Varun Vontimitta , Spencer Whitman , Joshua Saxe

Rethinking the Evaluation of Secure Code Generation

Large language models (LLMs) are widely used in software development. However, the code generated by LLMs often contains vulnerabilities. Several secure code generation methods have been proposed to address this issue, but their current…

Cryptography and Security · Computer Science 2025-11-14 Shih-Chieh Dai , Jun Xu , Guanhong Tao

CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models

Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks. Their advances in competition-level programming problems have made them an essential pillar of AI-assisted pair…

Cryptography and Security · Computer Science 2023-10-24 Hossein Hajipour , Keno Hassler , Thorsten Holz , Lea Schönherr , Mario Fritz

A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code

The increasing adoption of large language models (LLMs) in software engineering necessitates rigorous security evaluation of their generated code. However, existing benchmarks often lack relevance to real-world AI-assisted programming…

Software Engineering · Computer Science 2025-09-19 Keke Lian , Bin Wang , Lei Zhang , Libo Chen , Junjie Wang , Ziming Zhao , Yujiu Yang , Miaoqian Lin , Haotong Duan , Haoran Zhao , Shuang Liao , Mingda Guo , Jiazheng Quan , Yilu Zhong , Chenhao He , Zichuan Chen , Jie Wu , Haoling Li , Zhaoxuan Li , Jiongchi Yu , Hui Li , Dong Zhang

CFCEval: Evaluating Security Aspects in Code Generated by Large Language Models

Code-focused Large Language Models (LLMs), such as CodeX and Star-Coder, have demonstrated remarkable capabilities in enhancing developer productivity through context-aware code generation. However, evaluating the quality and security of…

Software Engineering · Computer Science 2025-12-09 Cheng Cheng , Jinqiu Yang

SALLM: Security Assessment of Generated Code

With the growing popularity of Large Language Models (LLMs) in software engineers' daily practices, it is important to ensure that the code generated by these tools is not only functionally correct but also free of vulnerabilities. Although…

Software Engineering · Computer Science 2024-09-06 Mohammed Latif Siddiq , Joanna C. S. Santos , Sajith Devareddy , Anna Muller

CWEval: Outcome-driven Evaluation on Functionality and Security of LLM Code Generation

Large Language Models (LLMs) have significantly aided developers by generating or assisting in code writing, enhancing productivity across various tasks. While identifying incorrect code is often straightforward, detecting vulnerabilities…

Software Engineering · Computer Science 2025-01-15 Jinjun Peng , Leyi Cui , Kele Huang , Junfeng Yang , Baishakhi Ray

L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models

Recently, large language models (LLMs), especially those that are pretrained on code, have demonstrated strong capabilities in generating programs from natural language inputs in a few-shot or even zero-shot manner. Despite promising…

Computation and Language · Computer Science 2023-10-03 Ansong Ni , Pengcheng Yin , Yilun Zhao , Martin Riddell , Troy Feng , Rui Shen , Stephen Yin , Ye Liu , Semih Yavuz , Caiming Xiong , Shafiq Joty , Yingbo Zhou , Dragomir Radev , Arman Cohan

LLM Security Guard for Code

Many developers rely on Large Language Models (LLMs) to facilitate software development. Nevertheless, these models have exhibited limited capabilities in the security domain. We introduce LLMSecGuard, a framework to offer enhanced code…

Software Engineering · Computer Science 2024-05-07 Arya Kavian , Mohammad Mehdi Pourhashem Kallehbasti , Sajjad Kazemi , Ehsan Firouzi , Mohammad Ghafari

Software Vulnerability and Functionality Assessment using LLMs

While code review is central to the software development process, it can be tedious and expensive to carry out. In this paper, we investigate whether and how Large Language Models (LLMs) can aid with code reviews. Our investigation focuses…

Software Engineering · Computer Science 2024-03-14 Rasmus Ingemann Tuffveson Jensen , Vali Tawosi , Salwa Alamir