Related papers: Automated Software Vulnerability Static Code Analy…
Generative pre-trained transformers (GPT's) are a type of large language machine learning model that are unusually adept at producing novel, and coherent, natural language. In this study the ability of GPT models to generate novel and…
The increasing complexity and volume of software systems have heightened the importance of identifying and mitigating security vulnerabilities. The existing software vulnerability datasets frequently fall short in providing comprehensive,…
This study compares state-of-the-art Large Language Models (LLMs) on their tendency to generate vulnerabilities when writing C programs using a neutral zero-shot prompt. Tihanyi et al. introduced the FormAI dataset at PROMISE'23, featuring…
While several studies have examined the security of code generated by GPT and other Large Language Models (LLMs), most have relied on controlled experiments rather than real developer interactions. This paper investigates the security of…
Security vulnerabilities present in a code that has been written in diverse programming languages are among the most critical yet complicated aspects of source code to detect. Static analysis tools based on rule-based patterns usually do…
In this paper, we test the hypothesis that although OpenAI's GPT-4 performs well generally, we can fine-tune open-source models to outperform GPT-4 in smart contract vulnerability detection. We fine-tune two models from Meta's Code Llama…
In this study, we evaluated the capability of Large Language Models (LLMs), particularly OpenAI's GPT-4, in detecting software vulnerabilities, comparing their performance against traditional static code analyzers like Snyk and Fortify. Our…
AI-powered coding assistants such as GitHub's Copilot and OpenAI's ChatGPT have achieved notable success in automating code generation. However, these tools rely on pre-trained Large Language Models (LLMs) that are typically trained on…
This paper presents the FormAI dataset, a large collection of 112, 000 AI-generated compilable and independent C programs with vulnerability classification. We introduce a dynamic zero-shot prompting technique constructed to spawn diverse…
Thousands of security vulnerabilities are discovered in production software each year, either reported publicly to the Common Vulnerabilities and Exposures database or discovered internally in proprietary code. Vulnerabilities often…
With the rapid advancements in Natural Language Processing (NLP), large language models (LLMs) like GPT-4 have gained significant traction in diverse applications, including security vulnerability scanning. This paper investigates the…
Previous learning-based vulnerability detection methods relied on either medium-sized pre-trained models or smaller neural networks from scratch. Recent advancements in Large Pre-Trained Language Models (LLMs) have showcased remarkable…
Detecting security vulnerabilities in software before they are exploited has been a challenging problem for decades. Traditional code analysis methods have been proposed, but are often ineffective and inefficient. In this work, we model…
The large transformer-based language models demonstrate excellent performance in natural language processing. By considering the transferability of the knowledge gained by these models in one domain to other related domains, and the…
Generative Pre-trained Transformer (GPT) models have exhibited exciting progress in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the literature on the trustworthiness of GPT models remains…
Most vulnerability detection studies focus on datasets of vulnerabilities in C/C++ code, offering limited language diversity. Thus, the effectiveness of deep learning methods, including large language models (LLMs), in detecting software…
This paper presents a comprehensive empirical analysis of security vulnerabilities in AI-generated code across public GitHub repositories. We collected and analyzed 7,703 files explicitly attributed to four major AI tools: ChatGPT…
The task of accurate and efficient language translation is an extremely important information processing task. Machine learning enabled and automated translation that is accurate and fast is often a large topic of interest in the machine…
Early identification of security issues in software development is vital to minimize their unanticipated impacts. Code review is a widely used manual analysis method that aims to uncover security issues along with other coding issues in…
This study explores the limitations of traditional Cybersecurity Awareness and Training (CSAT) programs and proposes an innovative solution using Generative Pre-Trained Transformers (GPT) to address these shortcomings. Traditional…