Related papers: Instruction Tuning for Secure Code Generation

Secure-Instruct: An Automated Pipeline for Synthesizing Instruction-Tuning Datasets Using LLMs for Secure Code Generation

Although Large Language Models (LLMs) show promising solutions to automated code generation, they often produce insecure code that threatens software security. Current approaches (e.g., SafeCoder) to improve secure code generation are…

Software Engineering · Computer Science 2025-11-25 Junjie Li , Fazle Rabbi , Bo Yang , Song Wang , Jinqiu Yang

Toward Secure Tuning: Mitigating Security Risks from Instruction Fine-Tuning

Instruction fine-tuning has emerged as a critical technique for customizing Large Language Models (LLMs) to specific applications. However, recent studies have highlighted significant security vulnerabilities in fine-tuned LLMs. Existing…

Computation and Language · Computer Science 2025-02-18 Yanrui Du , Sendong Zhao , Jiawei Cao , Ming Ma , Danyang Zhao , Shuren Qi , Fenglei Fan , Ting Liu , Bing Qin

InstructCoder: Instruction Tuning Large Language Models for Code Editing

Code editing encompasses a variety of pragmatic tasks that developers deal with daily. Despite its relevance and practical usefulness, automatic code editing remains an underexplored area in the evolution of deep learning models, partly due…

Computation and Language · Computer Science 2024-02-29 Kaixin Li , Qisheng Hu , Xu Zhao , Hui Chen , Yuxi Xie , Tiedong Liu , Qizhe Xie , Junxian He

HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training Data

Large language models (LLMs) have shown great potential for automatic code generation and form the basis for various tools such as GitHub Copilot. However, recent studies highlight that many LLM-generated code contains serious security…

Cryptography and Security · Computer Science 2024-09-11 Hossein Hajipour , Lea Schönherr , Thorsten Holz , Mario Fritz

WaveCoder: Widespread And Versatile Enhancement For Code Large Language Models By Instruction Tuning

Recent work demonstrates that, after instruction tuning, Code Large Language Models (Code LLMs) can obtain impressive capabilities to address a wide range of code-related tasks. However, current instruction tuning methods for Code LLMs…

Computation and Language · Computer Science 2024-06-10 Zhaojian Yu , Xin Zhang , Ning Shang , Yangyu Huang , Can Xu , Yishujie Zhao , Wenxiang Hu , Qiufeng Yin

Enhancing Source Code Security with LLMs: Demystifying The Challenges and Generating Reliable Repairs

With the recent unprecedented advancements in Artificial Intelligence (AI) computing, progress in Large Language Models (LLMs) is accelerating rapidly, presenting challenges in establishing clear guidelines, particularly in the field of…

Cryptography and Security · Computer Science 2024-09-04 Nafis Tanveer Islam , Joseph Khoury , Andrew Seong , Elias Bou-Harb , Peyman Najafirad

Safety-Tuned LLaMAs: Lessons From Improving the Safety of Large Language Models that Follow Instructions

Training large language models to follow instructions makes them perform better on a wide range of tasks and generally become more helpful. However, a perfectly helpful model will follow even the most malicious instructions and readily…

Computation and Language · Computer Science 2024-03-20 Federico Bianchi , Mirac Suzgun , Giuseppe Attanasio , Paul Röttger , Dan Jurafsky , Tatsunori Hashimoto , James Zou

Locking Down the Finetuned LLMs Safety

Fine-tuning large language models (LLMs) on additional datasets is often necessary to optimize them for specific downstream tasks. However, existing safety alignment measures, which restrict harmful behavior during inference, are…

Computation and Language · Computer Science 2024-10-15 Minjun Zhu , Linyi Yang , Yifan Wei , Ningyu Zhang , Yue Zhang

SafeTune: Mitigating Data Poisoning in LLM Fine-Tuning for RTL Code Generation

As large language models (LLMs) are increasingly fine-tuned for hardware tasks like RTL code generation, the scarcity of high-quality datasets often leads to the use of rapidly assembled or generated training data. These datasets frequently…

Cryptography and Security · Computer Science 2026-05-01 Mahshid Rezakhani , Nowfel Mashnoor , Kimia Azar , Hadi Kamali

SecCoder: Towards Generalizable and Robust Secure Code Generation

After large models (LMs) have gained widespread acceptance in code-related tasks, their superior generative capacity has greatly promoted the application of the code LM. Nevertheless, the security of the generated code has raised attention…

Programming Languages · Computer Science 2024-10-03 Boyu Zhang , Tianyu Du , Junkai Tong , Xuhong Zhang , Kingsum Chow , Sheng Cheng , Xun Wang , Jianwei Yin

UnitCoder: Scalable Iterative Code Synthesis with Unit Test Guidance

Large Language Models (LLMs) have demonstrated remarkable capabilities in various tasks, yet code generation remains a major challenge. Current approaches for obtaining high-quality code data primarily focus on (i) collecting large-scale…

Computation and Language · Computer Science 2025-02-18 Yichuan Ma , Yunfan Shao , Peiji Li , Demin Song , Qipeng Guo , Linyang Li , Xipeng Qiu , Kai Chen

Constrained Decoding for Secure Code Generation

Code Large Language Models (Code LLMs) have been increasingly used by developers to boost productivity, but they often generate vulnerable code. Thus, there is an urgent need to ensure that code generated by Code LLMs is correct and secure.…

Cryptography and Security · Computer Science 2024-07-23 Yanjun Fu , Ethan Baker , Yu Ding , Yizheng Chen

CodecLM: Aligning Language Models with Tailored Synthetic Data

Instruction tuning has emerged as the key in aligning large language models (LLMs) with specific task instructions, thereby mitigating the discrepancy between the next-token prediction objective and users' actual goals. To reduce the labor…

Computation and Language · Computer Science 2024-04-10 Zifeng Wang , Chun-Liang Li , Vincent Perot , Long T. Le , Jin Miao , Zizhao Zhang , Chen-Yu Lee , Tomas Pfister

An Exploratory Study on Fine-Tuning Large Language Models for Secure Code Generation

AI-powered coding assistants such as GitHub's Copilot and OpenAI's ChatGPT have achieved notable success in automating code generation. However, these tools rely on pre-trained Large Language Models (LLMs) that are typically trained on…

Software Engineering · Computer Science 2025-09-30 Junjie Li , Fazle Rabbi , Cheng Cheng , Aseem Sangalay , Yuan Tian , Jinqiu Yang

Finetuning Large Language Models for Vulnerability Detection

This paper presents the results of finetuning large language models (LLMs) for the task of detecting vulnerabilities in source code. We leverage WizardCoder, a recent improvement of the state-of-the-art LLM StarCoder, and adapt it for…

Cryptography and Security · Computer Science 2024-07-30 Alexey Shestov , Rodion Levichev , Ravil Mussabayev , Evgeny Maslov , Anton Cheshkov , Pavel Zadorozhny

Code Security Vulnerability Repair Using Reinforcement Learning with Large Language Models

With the recent advancement of Large Language Models (LLMs), generating functionally correct code has become less complicated for a wide array of developers. While using LLMs has sped up the functional development process, it poses a heavy…

Cryptography and Security · Computer Science 2024-02-01 Nafis Tanveer Islam , Mohammad Bahrami Karkevandi , Peyman Najafirad

Your Instructions Are Not Always Helpful: Assessing the Efficacy of Instruction Fine-tuning for Software Vulnerability Detection

Software, while beneficial, poses potential cybersecurity risks due to inherent vulnerabilities. Detecting these vulnerabilities is crucial, and deep learning has shown promise as an effective tool for this task due to its ability to…

Software Engineering · Computer Science 2024-01-17 Imam Nur Bani Yusuf , Lingxiao Jiang

SafeTuneBed: A Toolkit for Benchmarking LLM Safety Alignment in Fine-Tuning

As large language models (LLMs) become ubiquitous, parameter-efficient fine-tuning methods and safety-first defenses have proliferated rapidly. However, the number of approaches and their recent increase have resulted in diverse…

Machine Learning · Computer Science 2025-06-03 Saad Hossain , Samanvay Vajpayee , Sirisha Rambhatla

Secure Code Generation via Online Reinforcement Learning with Vulnerability Reward Model

Large language models (LLMs) are increasingly used in software development, yet their tendency to generate insecure code remains a major barrier to real-world deployment. Existing secure code alignment methods often suffer from a…

Cryptography and Security · Computer Science 2026-02-10 Tianyi Wu , Mingzhe Du , Yue Liu , Chengran Yang , Terry Yue Zhuo , Jiaheng Zhang , See-Kiong Ng

Fine-Tuning Lowers Safety and Disrupts Evaluation Consistency

Fine-tuning a general-purpose large language model (LLM) for a specific domain or task has become a routine procedure for ordinary users. However, fine-tuning is known to remove the safety alignment features of the model, even when the…

Computation and Language · Computer Science 2025-06-23 Kathleen C. Fraser , Hillary Dawkins , Isar Nejadgholi , Svetlana Kiritchenko