English

Software Vulnerability Prediction Knowledge Transferring Between Programming Languages

Software Engineering 2023-03-14 v1 Artificial Intelligence Machine Learning

Abstract

Developing automated and smart software vulnerability detection models has been receiving great attention from both research and development communities. One of the biggest challenges in this area is the lack of code samples for all different programming languages. In this study, we address this issue by proposing a transfer learning technique to leverage available datasets and generate a model to detect common vulnerabilities in different programming languages. We use C source code samples to train a Convolutional Neural Network (CNN) model, then, we use Java source code samples to adopt and evaluate the learned model. We use code samples from two benchmark datasets: NIST Software Assurance Reference Dataset (SARD) and Draper VDISC dataset. The results show that proposed model detects vulnerabilities in both C and Java codes with average recall of 72\%. Additionally, we employ explainable AI to investigate how much each feature contributes to the knowledge transfer mechanisms between C and Java in the proposed model.

Keywords

Cite

@article{arxiv.2303.06177,
  title  = {Software Vulnerability Prediction Knowledge Transferring Between Programming Languages},
  author = {Khadija Hanifi and Ramin F Fouladi and Basak Gencer Unsalver and Goksu Karadag},
  journal= {arXiv preprint arXiv:2303.06177},
  year   = {2023}
}

Comments

9 pages, 8 figures, Accepted for presentation in 18th International Conference on Evaluation of Novel Approaches to Software engineering (ENASE 2023), PRAUGE, CZECH REPUBLIC