Related papers: LLMSecCode: Evaluating Large Language Models for S…

LLM Security Guard for Code

Many developers rely on Large Language Models (LLMs) to facilitate software development. Nevertheless, these models have exhibited limited capabilities in the security domain. We introduce LLMSecGuard, a framework to offer enhanced code…

Software Engineering · Computer Science 2024-05-07 Arya Kavian , Mohammad Mehdi Pourhashem Kallehbasti , Sajjad Kazemi , Ehsan Firouzi , Mohammad Ghafari

SECURE: Benchmarking Large Language Models for Cybersecurity

Large Language Models (LLMs) have demonstrated potential in cybersecurity applications but have also caused lower confidence due to problems like hallucinations and a lack of truthfulness. Existing benchmarks provide general evaluations but…

Cryptography and Security · Computer Science 2024-10-31 Dipkamal Bhusal , Md Tanvirul Alam , Le Nguyen , Ashim Mahara , Zachary Lightcap , Rodney Frazier , Romy Fieblinger , Grace Long Torales , Benjamin A. Blakely , Nidhi Rastogi

Secure coding for web applications: Frameworks, challenges, and the role of LLMs

Secure coding is a critical yet often overlooked practice in software development. Despite extensive awareness efforts, real-world adoption remains inconsistent due to organizational, educational, and technical barriers. This paper provides…

Software Engineering · Computer Science 2025-10-02 Kiana Kiashemshaki , Mohammad Jalili Torkamani , Negin Mahmoudi

Leveraging Large Language Models for Trustworthiness Assessment of Web Applications

The widespread adoption of web applications has made their security a critical concern and has increased the need for systematic ways to assess whether they can be considered trustworthy. However, "trust" assessment remains an open problem…

Cryptography and Security · Computer Science 2026-03-26 Oleksandr Yarotskyi , José D'Abruzzo Pereira , João R. Campos

Is Your AI-Generated Code Really Safe? Evaluating Large Language Models on Secure Code Generation with CodeSecEval

Large language models (LLMs) have brought significant advancements to code generation and code repair, benefiting both novice and experienced developers. However, their training using unsanitized data from open-source repositories, like…

Software Engineering · Computer Science 2024-07-08 Jiexin Wang , Xitong Luo , Liuwen Cao , Hongkui He , Hailin Huang , Jiayuan Xie , Adam Jatowt , Yi Cai

Enhancing Large Language Models for Secure Code Generation: A Dataset-driven Study on Vulnerability Mitigation

Large language models (LLMs) have brought significant advancements to code generation, benefiting both novice and experienced developers. However, their training using unsanitized data from open-source repositories, like GitHub, introduces…

Software Engineering · Computer Science 2023-10-26 Jiexin Wang , Liuwen Cao , Xitong Luo , Zhiping Zhou , Jiayuan Xie , Adam Jatowt , Yi Cai

An Insight into Security Code Review with LLMs: Capabilities, Obstacles, and Influential Factors

Security code review is a time-consuming and labor-intensive process typically requiring integration with automated security defect detection tools. However, existing security analysis tools struggle with poor generalization, high false…

Software Engineering · Computer Science 2026-05-12 Jiaxin Yu , Peng Liang , Yujia Fu , Amjed Tahir , Mojtaba Shahin , Chong Wang , Yangxiao Cai

SafeGenBench: A Benchmark Framework for Security Vulnerability Detection in LLM-Generated Code

The code generation capabilities of large language models(LLMs) have emerged as a critical dimension in evaluating their overall performance. However, prior research has largely overlooked the security risks inherent in the generated code.…

Cryptography and Security · Computer Science 2025-06-23 Xinghang Li , Jingzhe Ding , Chao Peng , Bing Zhao , Xiang Gao , Hongwan Gao , Xinchen Gu

Software Vulnerability and Functionality Assessment using LLMs

While code review is central to the software development process, it can be tedious and expensive to carry out. In this paper, we investigate whether and how Large Language Models (LLMs) can aid with code reviews. Our investigation focuses…

Software Engineering · Computer Science 2024-03-14 Rasmus Ingemann Tuffveson Jensen , Vali Tawosi , Salwa Alamir

Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models

This paper presents CyberSecEval, a comprehensive benchmark developed to help bolster the cybersecurity of Large Language Models (LLMs) employed as coding assistants. As what we believe to be the most extensive unified cybersecurity safety…

Cryptography and Security · Computer Science 2023-12-11 Manish Bhatt , Sahana Chennabasappa , Cyrus Nikolaidis , Shengye Wan , Ivan Evtimov , Dominik Gabi , Daniel Song , Faizan Ahmad , Cornelius Aschermann , Lorenzo Fontana , Sasha Frolov , Ravi Prakash Giri , Dhaval Kapil , Yiannis Kozyrakis , David LeBlanc , James Milazzo , Aleksandar Straumann , Gabriel Synnaeve , Varun Vontimitta , Spencer Whitman , Joshua Saxe

Rethinking the Evaluation of Secure Code Generation

Large language models (LLMs) are widely used in software development. However, the code generated by LLMs often contains vulnerabilities. Several secure code generation methods have been proposed to address this issue, but their current…

Cryptography and Security · Computer Science 2025-11-14 Shih-Chieh Dai , Jun Xu , Guanhong Tao

Can You Really Trust Code Copilots? Evaluating Large Language Models from a Code Security Perspective

Code security and usability are both essential for various coding assistant applications driven by large language models (LLMs). Current code security benchmarks focus solely on single evaluation task and paradigm, such as code completion…

Computation and Language · Computer Science 2025-05-16 Yutao Mou , Xiao Deng , Yuxiao Luo , Shikun Zhang , Wei Ye

Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study

Despite various approaches being employed to detect vulnerabilities, the number of reported vulnerabilities shows an upward trend over the years. This suggests the problems are not caught before the code is released, which could be caused…

Cryptography and Security · Computer Science 2025-02-14 Karl Tamberg , Hayretdin Bahsi

LLMSecEval: A Dataset of Natural Language Prompts for Security Evaluations

Large Language Models (LLMs) like Codex are powerful tools for performing code completion and code generation tasks as they are trained on billions of lines of code from publicly available sources. Moreover, these models are capable of…

Software Engineering · Computer Science 2023-03-17 Catherine Tony , Markus Mutas , Nicolás E. Díaz Ferreyra , Riccardo Scandariato

The Hidden Risks of LLM-Generated Web Application Code: A Security-Centric Evaluation of Code Generation Capabilities in Large Language Models

The rapid advancement of Large Language Models (LLMs) has enhanced software development processes, minimizing the time and effort required for coding and enhancing developer productivity. However, despite their potential benefits, code…

Cryptography and Security · Computer Science 2025-04-30 Swaroop Dora , Deven Lunkad , Naziya Aslam , S. Venkatesan , Sandeep Kumar Shukla

Exploring Advanced Methodologies in Security Evaluation for LLMs

Large Language Models (LLMs) represent an advanced evolution of earlier, simpler language models. They boast enhanced abilities to handle complex language patterns and generate coherent text, images, audios, and videos. Furthermore, they…

Cryptography and Security · Computer Science 2024-03-01 Jun Huang , Jiawei Zhang , Qi Wang , Weihong Han , Yanchun Zhang

CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models

Large language models (LLMs) introduce new security risks, but there are few comprehensive evaluation suites to measure and reduce these risks. We present BenchmarkName, a novel benchmark to quantify LLM security risks and capabilities. We…

Cryptography and Security · Computer Science 2024-04-23 Manish Bhatt , Sahana Chennabasappa , Yue Li , Cyrus Nikolaidis , Daniel Song , Shengye Wan , Faizan Ahmad , Cornelius Aschermann , Yaohui Chen , Dhaval Kapil , David Molnar , Spencer Whitman , Joshua Saxe

Helping LLMs Improve Code Generation Using Feedback from Testing and Static Analysis

Large Language Models (LLMs) are one of the most promising developments in the field of artificial intelligence, and the software engineering community has readily noticed their potential role in the software development life-cycle.…

Software Engineering · Computer Science 2026-03-16 Greta Dolcetti , Vincenzo Arceri , Eleonora Iotti , Sergio Maffeis , Agostino Cortesi , Enea Zaffanella

SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity

Evaluating Large Language Models (LLMs) is crucial for understanding their capabilities and limitations across various applications, including natural language processing and code generation. Existing benchmarks like MMLU, C-Eval, and…

Cryptography and Security · Computer Science 2025-01-07 Pengfei Jing , Mengyun Tang , Xiaorong Shi , Xing Zheng , Sen Nie , Shi Wu , Yong Yang , Xiapu Luo

A Software Engineering Perspective on Testing Large Language Models: Research, Practice, Tools and Benchmarks

Large Language Models (LLMs) are rapidly becoming ubiquitous both as stand-alone tools and as components of current and future software systems. To enable usage of LLMs in the high-stake or safety-critical systems of 2030, they need to…

Software Engineering · Computer Science 2024-06-13 Sinclair Hudson , Sophia Jit , Boyue Caroline Hu , Marsha Chechik