English
Related papers

Related papers: SecCodeBench-V2 Technical Report

200 papers

Existing benchmarks for evaluating the security risks and capabilities (e.g., vulnerability detection) of code-generating large language models (LLMs) face several key limitations: (1) limited coverage of risk and capabilities; (2) reliance…

Cryptography and Security · Computer Science 2025-09-22 Yuzhou Nie , Zhun Wang , Yu Yang , Ruizhe Jiang , Yuheng Tang , Xander Davies , Yarin Gal , Bo Li , Wenbo Guo , Dawn Song

Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation, but their proficiency in producing secure code remains a critical, under-explored area. Existing benchmarks often fall short by relying on synthetic…

Cryptography and Security · Computer Science 2026-02-02 Yanlin Wang , Ziyao Zhang , Chong Wang , Xinyi Xu , Mingwei Liu , Yong Wang , Jiachi Chen , Zibin Zheng

The code generation capabilities of large language models(LLMs) have emerged as a critical dimension in evaluating their overall performance. However, prior research has largely overlooked the security risks inherent in the generated code.…

Cryptography and Security · Computer Science 2025-06-23 Xinghang Li , Jingzhe Ding , Chao Peng , Bing Zhao , Xiang Gao , Hongwan Gao , Xinchen Gu

Evaluating Large Language Models (LLMs) is crucial for understanding their capabilities and limitations across various applications, including natural language processing and code generation. Existing benchmarks like MMLU, C-Eval, and…

Cryptography and Security · Computer Science 2025-01-07 Pengfei Jing , Mengyun Tang , Xiaorong Shi , Xing Zheng , Sen Nie , Shi Wu , Yong Yang , Xiapu Luo

Large language models (LLMs) are being increasingly integrated into practical hardware and firmware development pipelines for code generation. Existing studies have primarily focused on evaluating the functional correctness of LLM-generated…

Cryptography and Security · Computer Science 2026-01-21 Qirui Chen , Jingxian Shuai , Shuangwu Chen , Shenghao Ye , Zijian Wen , Xufei Su , Jie Jin , Jiangming Li , Jun Chen , Xiaobin Tan , Jian Yang

Software testing is a crucial phase in the software life cycle, helping identify potential risks and reduce maintenance costs. With the advancement of Large Language Models (LLMs), researchers have proposed an increasing number of LLM-based…

Software Engineering · Computer Science 2024-09-27 Quanjun Zhang , Ye Shang , Chunrong Fang , Siqi Gu , Jianyi Zhou , Zhenyu Chen

Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks. Their advances in competition-level programming problems have made them an essential pillar of AI-assisted pair…

Cryptography and Security · Computer Science 2023-10-24 Hossein Hajipour , Keno Hassler , Thorsten Holz , Lea Schönherr , Mario Fritz

Large language models (LLMs) now support automated software security tasks, including vulnerability discovery and proof-of-concept (PoC) generation. Existing benchmarks do not faithfully evaluate LLMs in real-world bug hunting scenarios…

Cryptography and Security · Computer Science 2026-05-27 Hwiwon Lee , Jiawei Liu , Dongjun Kim , Ziqi Zhang , Chunqiu Steven Xia , Lingming Zhang

Code security and usability are both essential for various coding assistant applications driven by large language models (LLMs). Current code security benchmarks focus solely on single evaluation task and paradigm, such as code completion…

Computation and Language · Computer Science 2025-05-16 Yutao Mou , Xiao Deng , Yuxiao Luo , Shikun Zhang , Wei Ye

How to evaluate Large Language Models (LLMs) in code generation is an open question. Existing benchmarks demonstrate poor alignment with real-world code repositories and are insufficient to evaluate the coding abilities of LLMs. This paper…

Computation and Language · Computer Science 2024-04-02 Jia Li , Ge Li , Xuanming Zhang , Yihong Dong , Zhi Jin

This paper introduces SecRepoBench, a benchmark to evaluate code agents on secure code completion in real-world repositories. SecRepoBench has 318 code completion tasks in 27 C/C++ repositories, covering 15 CWEs. We evaluate 29 standalone…

Cryptography and Security · Computer Science 2026-02-17 Chihao Shen , Connor Dilgren , Purva Chiniya , Luke Griffith , Yu Ding , Yizheng Chen

We introduce LiveSecBench, a continuously updated safety benchmark specifically for Chinese-language LLM application scenarios. LiveSecBench constructs a high-quality and unique dataset through a pipeline that combines automated generation…

Large Language Models (LLMs) are increasingly integrated into real-world applications via the Model Context Protocol (MCP), a universal open standard for connecting AI agents with data sources and external tools. While MCP enhances the…

Cryptography and Security · Computer Science 2026-02-13 Yixuan Yang , Cuifeng Gao , Daoyuan Wu , Yufan Chen , Yingjiu Li , Shuai Wang

The emergence of long-context language models with context windows extending to millions of tokens has created new opportunities for sophisticated code understanding and software development evaluation. We propose LoCoBench, a comprehensive…

DevBench is a telemetry-driven benchmark designed to evaluate Large Language Models (LLMs) on realistic code completion tasks. It includes 1,800 evaluation instances across six programming languages and six task categories derived from real…

Machine Learning · Computer Science 2026-05-19 Adarsh Kumarappan , Pareesa Ameneh Golnari , Wen Wen , Xiaoyu Liu , Gabriel Ryan , Yuting Sun , Shengyu Fu , Elsie Nallipogu

Large Language Models are rapidly becoming core components of modern software development workflows, yet ensuring code security remains challenging. Existing vulnerability detection pipelines either rely on static analyzers or use…

Cryptography and Security · Computer Science 2026-02-12 Weichen Yu , Ravi Mangal , Yinyi Luo , Kai Hu , Jingxuan He , Corina S. Pasareanu , Matt Fredrikson

Large Language Models (LLMs) have demonstrated remarkable capabilities across various domains, with code generation emerging as a key area of focus. While numerous benchmarks have been proposed to evaluate their code generation abilities,…

Large Language Models (LLMs) have training corpora containing large amounts of program code, greatly improving the model's code comprehension and generation capabilities. However, sound comprehensive research on detecting program…

Cryptography and Security · Computer Science 2024-08-22 Yu Liu , Lang Gao , Mingxin Yang , Yu Xie , Ping Chen , Xiaojin Zhang , Wei Chen

Autonomous unmanned aerial vehicle (UAV) systems are increasingly deployed in safety-critical, networked environments where they must operate reliably in the presence of malicious adversaries. While recent benchmarks have evaluated large…

Cryptography and Security · Computer Science 2026-01-27 Mohamed Amine Ferrag , Abderrahmane Lakas , Merouane Debbah

Large language model-powered code agents are rapidly transforming software engineering, yet the security risks of their generated code have become a critical concern. Existing benchmarks have provided valuable insights, but they fail to…

Software Engineering · Computer Science 2026-04-27 Junkai Chen , Huihui Huang , Yunbo Lyu , Junwen An , Jieke Shi , Chengran Yang , Ting Zhang , Haoye Tian , Yikun Li , Zhenhao Li , Xin Zhou , Xing Hu , David Lo
‹ Prev 1 2 3 10 Next ›