English
Related papers

Related papers: AutoBaxBuilder: Bootstrapping Code Security Benchm…

200 papers

Automatic program generation has long been a fundamental challenge in computer science. Recent benchmarks have shown that large language models (LLMs) can effectively generate code at the function level, make code edits, and solve…

Cryptography and Security · Computer Science 2025-06-02 Mark Vero , Niels Mündler , Victor Chibotaru , Veselin Raychev , Maximilian Baader , Nikola Jovanović , Jingxuan He , Martin Vechev

Large Language Models (LLMs) have demonstrated remarkable capabilities across various domains, with code generation emerging as a key area of focus. While numerous benchmarks have been proposed to evaluate their code generation abilities,…

Software engineers in various industrial domains are already using Large Language Models (LLMs) to accelerate the process of implementing parts of software systems. When considering its potential use for ADAS or AD systems in the automotive…

Software Engineering · Computer Science 2025-05-27 Ali Nouri , Beatriz Cabrero-Daniel , Zhennan Fei , Krishna Ronanki , Håkan Sivencrona , Christian Berger

The code generation capabilities of large language models(LLMs) have emerged as a critical dimension in evaluating their overall performance. However, prior research has largely overlooked the security risks inherent in the generated code.…

Cryptography and Security · Computer Science 2025-06-23 Xinghang Li , Jingzhe Ding , Chao Peng , Bing Zhao , Xiang Gao , Hongwan Gao , Xinchen Gu

Large language models (LLMs) are being increasingly integrated into practical hardware and firmware development pipelines for code generation. Existing studies have primarily focused on evaluating the functional correctness of LLM-generated…

Cryptography and Security · Computer Science 2026-01-21 Qirui Chen , Jingxian Shuai , Shuangwu Chen , Shenghao Ye , Zijian Wen , Xufei Su , Jie Jin , Jiangming Li , Jun Chen , Xiaobin Tan , Jian Yang

The rapid evolution of Large Language Models (LLMs) has outpaced the development of model evaluation, highlighting the need for continuous curation of new, challenging benchmarks. However, manual curation of high-quality, human-aligned…

Machine Learning · Computer Science 2024-10-16 Tianle Li , Wei-Lin Chiang , Evan Frick , Lisa Dunlap , Tianhao Wu , Banghua Zhu , Joseph E. Gonzalez , Ion Stoica

The rapid advancement of large language models (LLMs) has led to a surge in both model supply and application demands. To facilitate effective matching between them, reliable, generic and efficient benchmark generators are widely needed.…

Computation and Language · Computer Science 2025-02-05 Peiwen Yuan , Shaoxiong Feng , Yiwei Li , Xinglin Wang , Yueqi Zhang , Jiayi Shi , Chuyi Tan , Boyuan Pan , Yao Hu , Kan Li

Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks. Their advances in competition-level programming problems have made them an essential pillar of AI-assisted pair…

Cryptography and Security · Computer Science 2023-10-24 Hossein Hajipour , Keno Hassler , Thorsten Holz , Lea Schönherr , Mario Fritz

With the rising demand for code quality assurance, developers are not only utilizing existing static code checkers but also seeking custom checkers to satisfy their specific needs. Nowadays, various code-checking frameworks provide…

Software Engineering · Computer Science 2025-07-18 Jun Liu , Yuanyuan Xie , Jiwei Yan , Jinhao Huang , Jun Yan , Jian Zhang

The rapid proliferation of large language models (LLMs) has intensified the requirement for reliable safety evaluation to uncover model vulnerabilities. To this end, numerous LLM safety evaluation benchmarks are proposed. However, existing…

Computation and Language · Computer Science 2025-08-22 Xiangyang Zhu , Yuan Tian , Chunyi Li , Kaiwei Zhang , Wei Sun , Guangtao Zhai

Artificial Intelligence (AI)-driven code generation tools are increasingly used throughout the software development lifecycle to accelerate coding tasks. However, the security of AI-generated code using Large Language Models (LLMs) remains…

Cryptography and Security · Computer Science 2026-03-10 Mohammed Kharma , Soohyeon Choi , Mohammed AlKhanafseh , David Mohaisen

Large language models (LLMs) and autonomous coding agents are increasingly used to generate software across a wide range of domains. Yet a core requirement remains unmet: ensuring that generated code is secure without compromising its…

Software Engineering · Computer Science 2025-11-27 Abhijeet Pathak , Suvadra Barua , Dinesh Gudimetla , Rupam Patir , Jiawei Guo , Hongxin Hu , Haipeng Cai

Task automation has been greatly empowered by the recent advances in Large Language Models (LLMs) via Python code, where the tasks ranging from software engineering development to general-purpose reasoning. While current benchmarks have…

The rapid advancement of Large Language Models (LLMs) has enhanced software development processes, minimizing the time and effort required for coding and enhancing developer productivity. However, despite their potential benefits, code…

Cryptography and Security · Computer Science 2025-04-30 Swaroop Dora , Deven Lunkad , Naziya Aslam , S. Venkatesan , Sandeep Kumar Shukla

Large vision-language models (LVLMs) exhibit remarkable capabilities in cross-modal tasks but face significant safety challenges, which undermine their reliability in real-world applications. Efforts have been made to build LVLM safety…

Computation and Language · Computer Science 2026-01-28 Xiangyang Zhu , Yuan Tian , Zicheng Zhang , Qi Jia , Chunyi Li , Renrui Zhang , Heng Li , Zongrui Wang , Wei Sun

The proliferation of Large Language Models (LLMs) has revolutionized natural language processing and significantly impacted code generation tasks, enhancing software development efficiency and productivity. Notably, LLMs like GPT-4 have…

Software Engineering · Computer Science 2025-03-25 Sheng Ouyang , Yihao Qin , Bo Lin , Liqian Chen , Xiaoguang Mao , Shangwen Wang

Recent advancements in automatic code generation using large language models (LLMs) have brought us closer to fully automated secure software development. However, existing approaches often rely on a single agent for code generation, which…

Software Engineering · Computer Science 2024-11-06 Ana Nunez , Nafis Tanveer Islam , Sumit Kumar Jha , Peyman Najafirad

As large language models become increasingly capable of generating code, evaluating their performance remains a complex and evolving challenge. Existing benchmarks primarily focus on functional correctness, overlooking the diversity of…

Software Engineering · Computer Science 2025-11-03 Forough Mehralian , Ryan Shar , James R. Rae , Alireza Hashemi

In light of the rapid adoption of AI coding assistants, LLM-assisted development has become increasingly prevalent, creating an urgent need for robust evaluation of generated code quality. Existing benchmarks often require extensive manual…

Software Engineering · Computer Science 2025-05-21 Yuancheng Jiang , Roland Yap , Zhenkai Liang

Large Language Models (LLMs) are used for many tasks, including those related to coding. An important aspect of being able to utilize LLMs is the ability to assess their fitness for specific usages. The common practice is to evaluate LLMs…

Artificial Intelligence · Computer Science 2024-07-30 Marcel Zalmanovici , Orna Raz , Eitan Farchi , Iftach Freund
‹ Prev 1 2 3 10 Next ›