English
Related papers

Related papers: Data Quality for Software Vulnerability Datasets

200 papers

Vulnerability detection is a crucial yet challenging task to identify potential weaknesses in software for cyber security. Recently, deep learning (DL) has made great progress in automating the detection process. Due to the complex…

Cryptography and Security · Computer Science 2024-10-10 Yuejun Guo , Seifeddine Bettaieb

Large Language Models (LLMs) are of great interest in vulnerability detection and repair. The effectiveness of these models hinges on the quality of the datasets used for both training and evaluation. Our investigation reveals that a number…

Software Engineering · Computer Science 2025-03-11 Anurag Swarnim Yadav , Joseph N. Wilson

Machine Learning (ML) models are being increasingly employed for credit risk evaluation, with their effectiveness largely hinging on the quality of the input data. In this paper we investigate the impact of several data quality issues,…

Machine Learning · Computer Science 2025-11-18 Andrea Maurino

Data-driven software engineering processes, such as vulnerability prediction heavily rely on the quality of the data used. In this paper, we observe that it is infeasible to obtain a noise-free security defect dataset in practice. Despite…

Software Engineering · Computer Science 2022-04-04 Roland Croft , M. Ali Babar , Huaming Chen

Data is a cornerstone of empirical software engineering (ESE) research and practice. Data underpin numerous process and project management activities, including the estimation of development effort and the prediction of the likely location…

Software Engineering · Computer Science 2020-12-22 Michael F. Bosu , Stephen G. MacDonell

Context: The utility of prediction models in empirical software engineering (ESE) is heavily reliant on the quality of the data used in building those models. Several data quality challenges such as noise, incompleteness, outliers and…

Software Engineering · Computer Science 2021-05-25 Michael Franklin Bosu , Stephen G. MacDonell

In recent years, machine learning has demonstrated impressive results in various fields, including software vulnerability detection. Nonetheless, using machine learning to identify software vulnerabilities presents new challenges,…

Cryptography and Security · Computer Science 2025-08-22 Sima Arasteh , Christophe Hauser

Automated detection of software vulnerabilities is a fundamental problem in software security. Existing program analysis techniques either suffer from high false positives or false negatives. Recent progress in Deep Learning (DL) has…

Software Engineering · Computer Science 2020-09-16 Saikat Chakraborty , Rahul Krishna , Yangruibo Ding , Baishakhi Ray

Datasets serve as crucial training resources and model performance trackers. However, existing datasets have exposed a plethora of problems, inducing biased models and unreliable evaluation results. In this paper, we propose a…

Computation and Language · Computer Science 2022-12-20 Chengwen Wang , Qingxiu Dong , Xiaochen Wang , Haitao Wang , Zhifang Sui

Reliable empirical models such as those used in software effort estimation or defect prediction are inherently dependent on the data from which they are built. As demands for process and product improvement continue to grow, the quality of…

Software Engineering · Computer Science 2021-06-14 Michael Franklin Bosu , Stephen G. MacDonell

Background: Software Vulnerability (SV) prediction needs large-sized and high-quality data to perform well. Current SV datasets mostly require expensive labeling efforts by experts (human-labeled) and thus are limited in size. Meanwhile,…

Software Engineering · Computer Science 2024-07-26 Triet H. M. Le , M. Ali Babar

Software vulnerabilities can have serious consequences, which is why many techniques have been proposed to defend against them. Among these, vulnerability detection techniques are a major area of focus. However, there is a lack of a…

Software Engineering · Computer Science 2023-03-30 Yingzhou Bi , Jiangtao Huang , Penghui Liu , Lianmei Wang

The label quality of defect data sets has a direct influence on the reliability of defect prediction models. In this study, for multi-version-project defect data sets, we propose an approach to automatically detecting instances with…

Software Engineering · Computer Science 2021-01-29 Shiran Liu , Zhaoqiang Guo , Yanhui Li , Chuanqi Wang , Lin Chen , Zhongbin Sun , Yuming Zhou

Deep learning (DL) techniques have achieved significant success in various software engineering tasks (e.g., code completion by Copilot). However, DL systems are prone to bugs from many sources, including training data. Existing literature…

Software Engineering · Computer Science 2025-08-12 Mehil B Shah , Mohammad Masudur Rahman , Foutse Khomh

Software Vulnerability (SV) severity assessment is a vital task for informing SV remediation and triage. Ranking of SV severity scores is often used to advise prioritization of patching efforts. However, severity assessment is a difficult…

Software Engineering · Computer Science 2022-01-19 Roland Croft , M. Ali Babar , Li Li

Software vulnerability detection is critical in software security because it identifies potential bugs in software systems, enabling immediate remediation and mitigation measures to be implemented before they may be exploited. Automatic…

Software Engineering · Computer Science 2023-06-21 Nima Shiri Harzevili , Alvine Boaye Belle , Junjie Wang , Song Wang , Zhen Ming , Jiang , Nachiappan Nagappan

The impact of software vulnerabilities on everyday software systems is significant. Despite deep learning models being proposed for vulnerability detection, their reliability is questionable. Prior evaluations show high recall/F1 scores of…

Software Engineering · Computer Science 2024-07-04 Partha Chakraborty , Krishna Kanth Arumugam , Mahmoud Alfadel , Meiyappan Nagappan , Shane McIntosh

AI-based solutions demonstrate remarkable results in identifying vulnerabilities in software, but research has consistently found that this performance does not generalize to unseen codebases. In this paper, we specifically investigate the…

Cryptography and Security · Computer Science 2025-10-08 Rijha Safdar , Danyail Mateen , Syed Taha Ali , M. Umer Ashfaq , Wajahat Hussain

In the context of the rising interest in code language models (code LMs) and vulnerability detection, we study the effectiveness of code LMs for detecting vulnerabilities. Our analysis reveals significant shortcomings in existing…

Software Engineering · Computer Science 2024-07-11 Yangruibo Ding , Yanjun Fu , Omniyyah Ibrahim , Chawin Sitawarin , Xinyun Chen , Basel Alomair , David Wagner , Baishakhi Ray , Yizheng Chen

Vulnerability detection is crucial to protect software security. Nowadays, deep learning (DL) is the most promising technique to automate this detection task, leveraging its superior ability to extract patterns and representations within…

Software Engineering · Computer Science 2026-02-13 Yuejun Guo , Qiang Hu , Qiang Tang , Yves Le Traon
‹ Prev 1 2 3 10 Next ›