Related papers: ActiveClean: Generating Line-Level Vulnerability D…

ActiveClean: Interactive Data Cleaning While Learning Convex Loss Models

Data cleaning is often an important step to ensure that predictive models, such as regression and classification, are not affected by systematic errors such as inconsistent, out-of-date, or outlier data. Identifying dirty data is often a…

Databases · Computer Science 2016-01-18 Sanjay Krishnan , Jiannan Wang , Eugene Wu , Michael J. Franklin , Ken Goldberg

Smart Cuts: Enhance Active Learning for Vulnerability Detection by Pruning Hard-to-Learn Data

Vulnerability detection is crucial for identifying security weaknesses in software systems. However, training effective machine learning models for this task is often constrained by the high cost and expertise required for data annotation.…

Cryptography and Security · Computer Science 2025-08-19 Xiang Lan , Tim Menzies , Bowen Xu

DeepCVA: Automated Commit-level Vulnerability Assessment with Deep Multi-task Learning

It is increasingly suggested to identify Software Vulnerabilities (SVs) in code commits to give early warnings about potential security risks. However, there is a lack of effort to assess vulnerability-contributing commits right after they…

Software Engineering · Computer Science 2021-08-19 Triet H. M. Le , David Hin , Roland Croft , M. Ali Babar

DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection

We propose and release a new vulnerable source code dataset. We curate the dataset by crawling security issue websites, extracting vulnerability-fixing commits and source codes from the corresponding projects. Our new dataset contains…

Cryptography and Security · Computer Science 2023-08-10 Yizheng Chen , Zhoujie Ding , Lamya Alowain , Xinyun Chen , David Wagner

CleanVul: Automatic Function-Level Vulnerability Detection in Code Commits Using LLM Heuristics

Accurate identification of software vulnerabilities is crucial for system integrity. Vulnerability datasets, often derived from the National Vulnerability Database (NVD) or directly from GitHub, are essential for training machine learning…

Software Engineering · Computer Science 2025-09-12 Yikun Li , Ting Zhang , Ratnadira Widyasari , Yan Naing Tun , Huu Hung Nguyen , Tan Bui , Ivana Clairine Irsan , Yiran Cheng , Xiang Lan , Han Wei Ang , Frank Liauw , Martin Weyssow , Hong Jin Kang , Eng Lieh Ouh , Lwin Khin Shar , David Lo

LineVD: Statement-level Vulnerability Detection using Graph Neural Networks

Current machine-learning based software vulnerability detection methods are primarily conducted at the function-level. However, a key limitation of these methods is that they do not indicate the specific lines of code contributing to…

Cryptography and Security · Computer Science 2022-03-28 David Hin , Andrey Kan , Huaming Chen , M. Ali Babar

Evaluating LLaMA 3.2 for Software Vulnerability Detection

Deep Learning (DL) has emerged as a powerful tool for vulnerability detection, often outperforming traditional solutions. However, developing effective DL models requires large amounts of real-world data, which can be difficult to obtain in…

Machine Learning · Computer Science 2025-03-12 José Gonçalves , Miguel Silva , Bernardo Cabral , Tiago Dias , Eva Maia , Isabel Praça , Ricardo Severino , Luís Lino Ferreira

Revisiting the Performance of Deep Learning-Based Vulnerability Detection on Realistic Datasets

The impact of software vulnerabilities on everyday software systems is significant. Despite deep learning models being proposed for vulnerability detection, their reliability is questionable. Prior evaluations show high recall/F1 scores of…

Software Engineering · Computer Science 2024-07-04 Partha Chakraborty , Krishna Kanth Arumugam , Mahmoud Alfadel , Meiyappan Nagappan , Shane McIntosh

ICVul: A Well-labeled C/C++ Vulnerability Dataset with Comprehensive Metadata and VCCs

Machine learning-based software vulnerability detection requires high-quality datasets, which is essential for training effective models. To address challenges related to data label quality, diversity, and comprehensiveness, we constructed…

Software Engineering · Computer Science 2025-05-14 Chaomeng Lu , Tianyu Li , Toon Dehaene , Bert Lagaisse

FuncVul: An Effective Function Level Vulnerability Detection Model using LLM and Code Chunk

Software supply chain vulnerabilities arise when attackers exploit weaknesses by injecting vulnerable code into widely used packages or libraries within software repositories. While most existing approaches focus on identifying vulnerable…

Cryptography and Security · Computer Science 2025-06-25 Sajal Halder , Muhammad Ejaz Ahmed , Seyit Camtepe

From Lab to Reality: A Practical Evaluation of Deep Learning Models and LLMs for Vulnerability Detection

Vulnerability detection methods based on deep learning (DL) have shown strong performance on benchmark datasets, yet their real-world effectiveness remains underexplored. Recent work suggests that both graph neural network (GNN)-based and…

Cryptography and Security · Computer Science 2025-12-12 Chaomeng Lu , Bert Lagaisse

Selection-Based Vulnerabilities: Clean-Label Backdoor Attacks in Active Learning

Active learning(AL), which serves as the representative label-efficient learning paradigm, has been widely applied in resource-constrained scenarios. The achievement of AL is attributed to acquisition functions, which are designed for…

Cryptography and Security · Computer Science 2025-08-11 Yuhan Zhi , Longtian Wang , Xiaofei Xie , Chao Shen , Qiang Hu , Xiaohong Guan

VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements

Automatically locating vulnerable statements in source code is crucial to assure software security and alleviate developers' debugging efforts. This becomes even more important in today's software ecosystem, where vulnerable code can flow…

Software Engineering · Computer Science 2022-01-14 Yangruibo Ding , Sahil Suneja , Yunhui Zheng , Jim Laredo , Alessandro Morari , Gail Kaiser , Baishakhi Ray

Deep-Learning-based Vulnerability Detection in Binary Executables

The identification of vulnerabilities is an important element in the software development life cycle to ensure the security of software. While vulnerability identification based on the source code is a well studied field, the identification…

Cryptography and Security · Computer Science 2022-12-05 Andreas Schaad , Dominik Binder

An Empirical Study of Deep Learning Models for Vulnerability Detection

Deep learning (DL) models of code have recently reported great progress for vulnerability detection. In some cases, DL-based models have outperformed static analysis tools. Although many great models have been proposed, we do not yet have a…

Software Engineering · Computer Science 2023-02-14 Benjamin Steenhoek , Md Mahbubur Rahman , Richard Jiles , Wei Le

Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks

Vulnerability identification is crucial to protect the software systems from attacks for cyber security. It is especially important to localize the vulnerable functions among the source code to facilitate the fix. However, it is a…

Software Engineering · Computer Science 2019-09-10 Yaqin Zhou , Shangqing Liu , Jingkai Siow , Xiaoning Du , Yang Liu

Active WeaSuL: Improving Weak Supervision with Active Learning

The availability of labelled data is one of the main limitations in machine learning. We can alleviate this using weak supervision: a framework that uses expert-defined rules $\boldsymbol{\lambda}$ to estimate probabilistic labels…

Machine Learning · Computer Science 2021-05-03 Samantha Biegel , Rafah El-Khatib , Luiz Otavio Vilas Boas Oliveira , Max Baak , Nanne Aben

Vulnerability Detection with Code Language Models: How Far Are We?

In the context of the rising interest in code language models (code LMs) and vulnerability detection, we study the effectiveness of code LMs for detecting vulnerabilities. Our analysis reveals significant shortcomings in existing…

Software Engineering · Computer Science 2024-07-11 Yangruibo Ding , Yanjun Fu , Omniyyah Ibrahim , Chawin Sitawarin , Xinyun Chen , Basel Alomair , David Wagner , Baishakhi Ray , Yizheng Chen

Statement-Level Vulnerability Detection: Learning Vulnerability Patterns Through Information Theory and Contrastive Learning

Software vulnerabilities are a serious and crucial concern. Typically, in a program or function consisting of hundreds or thousands of source code statements, there are only a few statements causing the corresponding vulnerabilities. Most…

Cryptography and Security · Computer Science 2024-06-13 Van Nguyen , Trung Le , Chakkrit Tantithamthavorn , Michael Fu , John Grundy , Hung Nguyen , Seyit Camtepe , Paul Quirk , Dinh Phung

Label-Efficient Point Cloud Semantic Segmentation: An Active Learning Approach

Deep learning models are the state-of-the-art methods for semantic point cloud segmentation, the success of which relies on the availability of large-scale annotated datasets. However, it can be extremely time-consuming and prohibitively…

Computer Vision and Pattern Recognition · Computer Science 2021-04-13 Xian Shi , Xun Xu , Ke Chen , Lile Cai , Chuan Sheng Foo , Kui Jia