Related papers: Data Quality for Software Vulnerability Datasets
Vulnerability detection is a crucial yet challenging task to identify potential weaknesses in software for cyber security. Recently, deep learning (DL) has made great progress in automating the detection process. Due to the complex…
Large Language Models (LLMs) are of great interest in vulnerability detection and repair. The effectiveness of these models hinges on the quality of the datasets used for both training and evaluation. Our investigation reveals that a number…
Machine Learning (ML) models are being increasingly employed for credit risk evaluation, with their effectiveness largely hinging on the quality of the input data. In this paper we investigate the impact of several data quality issues,…
Data-driven software engineering processes, such as vulnerability prediction heavily rely on the quality of the data used. In this paper, we observe that it is infeasible to obtain a noise-free security defect dataset in practice. Despite…
Data is a cornerstone of empirical software engineering (ESE) research and practice. Data underpin numerous process and project management activities, including the estimation of development effort and the prediction of the likely location…
Context: The utility of prediction models in empirical software engineering (ESE) is heavily reliant on the quality of the data used in building those models. Several data quality challenges such as noise, incompleteness, outliers and…
In recent years, machine learning has demonstrated impressive results in various fields, including software vulnerability detection. Nonetheless, using machine learning to identify software vulnerabilities presents new challenges,…
Automated detection of software vulnerabilities is a fundamental problem in software security. Existing program analysis techniques either suffer from high false positives or false negatives. Recent progress in Deep Learning (DL) has…
Datasets serve as crucial training resources and model performance trackers. However, existing datasets have exposed a plethora of problems, inducing biased models and unreliable evaluation results. In this paper, we propose a…
Reliable empirical models such as those used in software effort estimation or defect prediction are inherently dependent on the data from which they are built. As demands for process and product improvement continue to grow, the quality of…
Background: Software Vulnerability (SV) prediction needs large-sized and high-quality data to perform well. Current SV datasets mostly require expensive labeling efforts by experts (human-labeled) and thus are limited in size. Meanwhile,…
Software vulnerabilities can have serious consequences, which is why many techniques have been proposed to defend against them. Among these, vulnerability detection techniques are a major area of focus. However, there is a lack of a…
The label quality of defect data sets has a direct influence on the reliability of defect prediction models. In this study, for multi-version-project defect data sets, we propose an approach to automatically detecting instances with…
Deep learning (DL) techniques have achieved significant success in various software engineering tasks (e.g., code completion by Copilot). However, DL systems are prone to bugs from many sources, including training data. Existing literature…
Software Vulnerability (SV) severity assessment is a vital task for informing SV remediation and triage. Ranking of SV severity scores is often used to advise prioritization of patching efforts. However, severity assessment is a difficult…
Software vulnerability detection is critical in software security because it identifies potential bugs in software systems, enabling immediate remediation and mitigation measures to be implemented before they may be exploited. Automatic…
The impact of software vulnerabilities on everyday software systems is significant. Despite deep learning models being proposed for vulnerability detection, their reliability is questionable. Prior evaluations show high recall/F1 scores of…
AI-based solutions demonstrate remarkable results in identifying vulnerabilities in software, but research has consistently found that this performance does not generalize to unseen codebases. In this paper, we specifically investigate the…
In the context of the rising interest in code language models (code LMs) and vulnerability detection, we study the effectiveness of code LMs for detecting vulnerabilities. Our analysis reveals significant shortcomings in existing…
Vulnerability detection is crucial to protect software security. Nowadays, deep learning (DL) is the most promising technique to automate this detection task, leveraging its superior ability to extract patterns and representations within…