Related papers: Pre-training by Predicting Program Dependencies fo…

DFEPT: Data Flow Embedding for Enhancing Pre-Trained Model Based Vulnerability Detection

Software vulnerabilities represent one of the most pressing threats to computing systems. Identifying vulnerabilities in source code is crucial for protecting user privacy and reducing economic losses. Traditional static analysis tools rely…

Software Engineering · Computer Science 2024-10-25 Zhonghao Jiang , Weifeng Sun , Xiaoyan Gu , Jiaxin Wu , Tao Wen , Haibo Hu , Meng Yan

CCT5: A Code-Change-Oriented Pre-Trained Model

Software is constantly changing, requiring developers to perform several derived tasks in a timely manner, such as writing a description for the intention of the code change, or identifying the defect-prone code changes. Considering that…

Software Engineering · Computer Science 2023-05-19 Bo Lin , Shangwen Wang , Zhongxin Liu , Yepang Liu , Xin Xia , Xiaoguang Mao

Identifying Non-Control Security-Critical Data through Program Dependence Learning

As control-flow protection gets widely deployed, it is difficult for attackers to corrupt control-data and achieve control-flow hijacking. Instead, data-oriented attacks, which manipulate non-control data, have been demonstrated to be…

Cryptography and Security · Computer Science 2024-05-03 Zhilong Wang , Haizhou Wang , Hong Hu , Peng Liu

What do pre-trained code models know about code?

Pre-trained models of code built on the transformer architecture have performed well on software engineering (SE) tasks such as predictive code generation, code summarization, among others. However, whether the vector representations from…

Software Engineering · Computer Science 2021-08-26 Anjan Karmakar , Romain Robbes

What Do They Capture? -- A Structural Analysis of Pre-Trained Language Models for Source Code

Recently, many pre-trained language models for source code have been proposed to model the context of code and serve as a basis for downstream code intelligence tasks such as code completion, code search, and code summarization. These…

Software Engineering · Computer Science 2022-02-15 Yao Wan , Wei Zhao , Hongyu Zhang , Yulei Sui , Guandong Xu , Hai Jin

Bridging Pre-trained Models and Downstream Tasks for Source Code Understanding

With the great success of pre-trained models, the pretrain-then-finetune paradigm has been widely adopted on downstream tasks for source code understanding. However, compared to costly training a large-scale model from scratch, how to…

Software Engineering · Computer Science 2022-03-16 Deze Wang , Zhouyang Jia , Shanshan Li , Yue Yu , Yun Xiong , Wei Dong , Xiangke Liao

Multi-View Pre-Trained Model for Code Vulnerability Identification

Vulnerability identification is crucial for cyber security in the software-related industry. Early identification methods require significant manual efforts in crafting features or annotating vulnerable code. Although the recent pre-trained…

Software Engineering · Computer Science 2022-08-11 Xuxiang Jiang , Yinhao Xiao , Jun Wang , Wei Zhang

Automated Vulnerability Detection Using Deep Learning Technique

Our work explores the utilization of deep learning, specifically leveraging the CodeBERT model, to enhance code security testing for Python applications by detecting SQL injection vulnerabilities. Unlike traditional security testing methods…

Cryptography and Security · Computer Science 2025-08-29 Guan-Yan Yang , Yi-Heng Ko , Farn Wang , Kuo-Hui Yeh , Haw-Shiang Chang , Hsueh-Yi Chen

Revisiting Pre-trained Language Models for Vulnerability Detection

The rapid advancement of pre-trained language models (PLMs) has demonstrated promising results for various code-related tasks. However, their effectiveness in detecting real-world vulnerabilities remains a critical challenge. While existing…

Cryptography and Security · Computer Science 2025-11-25 Youpeng Li , Weiliang Qi , Xuyu Wang , Fuxun Yu , Xinda Wang

Enhancing Pre-Trained Language Models for Vulnerability Detection via Semantic-Preserving Data Augmentation

With the rapid development and widespread use of advanced network systems, software vulnerabilities pose a significant threat to secure communications and networking. Learning-based vulnerability detection systems, particularly those…

Cryptography and Security · Computer Science 2024-10-04 Weiliang Qi , Jiahao Cao , Darsh Poddar , Sophia Li , Xinda Wang

Cross Version Defect Prediction with Class Dependency Embeddings

Software Defect Prediction aims at predicting which software modules are the most probable to contain defects. The idea behind this approach is to save time during the development process by helping find bugs early. Defect Prediction models…

Software Engineering · Computer Science 2023-01-02 Moti Cohen , Lior Rokach , Rami Puzis

StagedVulBERT: Multi-Granular Vulnerability Detection with a Novel Pre-trained Code Model

The emergence of pre-trained model-based vulnerability detection methods has significantly advanced the field of automated vulnerability detection. However, these methods still face several challenges, such as difficulty in learning…

Cryptography and Security · Computer Science 2024-10-10 Yuan Jiang , Yujian Zhang , Xiaohong Su , Christoph Treude , Tiantian Wang

DOBF: A Deobfuscation Pre-Training Objective for Programming Languages

Recent advances in self-supervised learning have dramatically improved the state of the art on a wide variety of tasks. However, research in language model pre-training has mostly focused on natural languages, and it is unclear whether…

Computation and Language · Computer Science 2021-10-29 Baptiste Roziere , Marie-Anne Lachaux , Marc Szafraniec , Guillaume Lample

Understanding the Automated Parameter Optimization on Transfer Learning for CPDP: An Empirical Study

Data-driven defect prediction has become increasingly important in software engineering process. Since it is not uncommon that data from a software project is insufficient for training a reliable defect prediction model, transfer learning…

Neural and Evolutionary Computing · Computer Science 2020-02-11 Ke Li , Zilin Xiang , Tao Chen , Shuo Wang , Kay Chen Tan

Data Complexity-aware Deep Model Performance Forecasting

Deep learning models are widely used across computer vision and other domains. When working on the model induction, selecting the right architecture for a given dataset often relies on repetitive trial-and-error procedures. This procedure…

Machine Learning · Computer Science 2026-01-06 Yen-Chia Chen , Hsing-Kuo Pao , Hanjuan Huang

Towards Developing and Analysing Metric-Based Software Defect Severity Prediction Model

In a critical software system, the testers have to spend an enormous amount of time and effort to maintain the software due to the continuous occurrence of defects. Among such defects, some severe defects may adversely affect the software.…

Software Engineering · Computer Science 2022-10-11 Umamaheswara Sharma B , Ravichandra Sadam

Simplification of Training Data for Cross-Project Defect Prediction

Cross-project defect prediction (CPDP) plays an important role in estimating the most likely defect-prone software components, especially for new or inactive projects. To the best of our knowledge, few prior studies provide explicit…

Software Engineering · Computer Science 2014-10-10 Peng He , Bing Li , Deguang Zhang , Yutao Ma

TRACED: Execution-aware Pre-training for Source Code

Most existing pre-trained language models for source code focus on learning the static code text, typically augmented with static code structures (abstract syntax tree, dependency graphs, etc.). However, program semantics will not be fully…

Software Engineering · Computer Science 2023-06-14 Yangruibo Ding , Ben Steenhoek , Kexin Pei , Gail Kaiser , Wei Le , Baishakhi Ray

ContraBERT: Enhancing Code Pre-trained Models via Contrastive Learning

Large-scale pre-trained models such as CodeBERT, GraphCodeBERT have earned widespread attention from both academia and industry. Attributed to the superior ability in code representation, they have been further applied in multiple…

Software Engineering · Computer Science 2023-01-24 Shangqing Liu , Bozhi Wu , Xiaofei Xie , Guozhu Meng , Yang Liu

Relate and Predict: Structure-Aware Prediction with Jointly Optimized Neural DAG

Understanding relationships between feature variables is one important way humans use to make decisions. However, state-of-the-art deep learning studies either focus on task-agnostic statistical dependency learning or do not model explicit…

Machine Learning · Computer Science 2021-03-04 Arshdeep Sekhon , Zhe Wang , Yanjun Qi