Related papers: Coding-PTMs: How to Find Optimal Code Pre-trained …

How to get better embeddings with code pre-trained models? An empirical study

Pre-trained language models have demonstrated powerful capabilities in the field of natural language processing (NLP). Recently, code pre-trained model (PTM), which draw from the experiences of the NLP field, have also achieved…

Software Engineering · Computer Science 2023-11-15 Yu Zhao , Lina Gong , Haoxiang Zhang , Yaoshen Yu , Zhiqiu Huang

Revisiting Pre-trained Language Models for Vulnerability Detection

The rapid advancement of pre-trained language models (PLMs) has demonstrated promising results for various code-related tasks. However, their effectiveness in detecting real-world vulnerabilities remains a critical challenge. While existing…

Cryptography and Security · Computer Science 2025-11-25 Youpeng Li , Weiliang Qi , Xuyu Wang , Fuxun Yu , Xinda Wang

How to Select Pre-Trained Code Models for Reuse? A Learning Perspective

Pre-training a language model and then fine-tuning it has shown to be an efficient and effective technique for a wide range of code intelligence tasks, such as code generation, code summarization, and vulnerability detection. However,…

Software Engineering · Computer Science 2025-01-08 Zhangqian Bi , Yao Wan , Zhaoyang Chu , Yufei Hu , Junyi Zhang , Hongyu Zhang , Guandong Xu , Hai Jin

Multi-View Pre-Trained Model for Code Vulnerability Identification

Vulnerability identification is crucial for cyber security in the software-related industry. Early identification methods require significant manual efforts in crafting features or annotating vulnerable code. Although the recent pre-trained…

Software Engineering · Computer Science 2022-08-11 Xuxiang Jiang , Yinhao Xiao , Jun Wang , Wei Zhang

A Comprehensive Evaluation of Parameter-Efficient Fine-Tuning on Software Engineering Tasks

Pre-trained models (PTMs) have achieved great success in various Software Engineering (SE) downstream tasks following the ``pre-train then fine-tune'' paradigm. As fully fine-tuning all parameters of PTMs can be computationally expensive, a…

Software Engineering · Computer Science 2023-12-27 Wentao Zou , Qi Li , Jidong Ge , Chuanyi Li , Xiaoyu Shen , Liguo Huang , Bin Luo

The Struggles of LLMs in Cross-lingual Code Clone Detection

With the involvement of multiple programming languages in modern software development, cross-lingual code clone detection has gained traction within the software engineering community. Numerous studies have explored this topic, proposing…

Software Engineering · Computer Science 2025-05-07 Micheline Bénédicte Moumoula , Abdoul Kader Kabore , Jacques Klein , Tegawendé Bissyande

Aligning Programming Language and Natural Language: Exploring Design Choices in Multi-Modal Transformer-Based Embedding for Bug Localization

Bug localization refers to the identification of source code files which is in a programming language and also responsible for the unexpected behavior of software using the bug report, which is a natural language. As bug localization is…

Software Engineering · Computer Science 2024-06-26 Partha Chakraborty , Venkatraman Arumugam , Meiyappan Nagappan

LLM-based Vulnerability Discovery through the Lens of Code Metrics

Large language models (LLMs) excel in many tasks of software engineering, yet progress in leveraging them for vulnerability discovery has stalled in recent years. To understand this phenomenon, we investigate LLMs through the lens of…

Cryptography and Security · Computer Science 2025-09-24 Felix Weissberg , Lukas Pirch , Erik Imgrund , Jonas Möller , Thorsten Eisenhofer , Konrad Rieck

CAT-probing: A Metric-based Approach to Interpret How Pre-trained Models for Programming Language Attend Code Structure

Code pre-trained models (CodePTMs) have recently demonstrated significant success in code intelligence. To interpret these models, some probing methods have been applied. However, these methods fail to consider the inherent characteristics…

Software Engineering · Computer Science 2022-12-13 Nuo Chen , Qiushi Sun , Renyu Zhu , Xiang Li , Xuesong Lu , Ming Gao

Optimizing Code Embeddings and ML Classifiers for Python Source Code Vulnerability Detection

In recent years, the growing complexity and scale of source code have rendered manual software vulnerability detection increasingly impractical. To address this challenge, automated approaches leveraging machine learning and code embeddings…

Software Engineering · Computer Science 2025-09-17 Talaya Farasat , Joachim Posegga

Using Source Code Metrics and Ensemble Methods for Fault Proneness Prediction

Software fault prediction model are employed to optimize testing resource allocation by identifying fault-prone classes before testing phases. Several researchers' have validated the use of different classification techniques to develop…

Software Engineering · Computer Science 2017-04-17 Lov Kumar , Santanu Rath , Ashish Sureka

A Systematic Study of Code Obfuscation Against LLM-based Vulnerability Detection

As large language models (LLMs) are increasingly adopted for code vulnerability detection, their reliability and robustness across diverse vulnerability types have become a pressing concern. In traditional adversarial settings, code…

Cryptography and Security · Computer Science 2025-12-19 Xiao Li , Yue Li , Hao Wu , Yue Zhang , Yechao Zhang , Fengyuan Xu , Sheng Zhong

Unveiling Memorization in Code Models

The availability of large-scale datasets, advanced architectures, and powerful computational resources have led to effective code models that automate diverse software engineering activities. The datasets usually consist of billions of…

Software Engineering · Computer Science 2024-01-15 Zhou Yang , Zhipeng Zhao , Chenyu Wang , Jieke Shi , Dongsun Kim , DongGyun Han , David Lo

A Comparison of Different Source Code Representation Methods for Vulnerability Prediction in Python

In the age of big data and machine learning, at a time when the techniques and methods of software development are evolving rapidly, a problem has arisen: programmers can no longer detect all the security flaws and vulnerabilities in their…

Software Engineering · Computer Science 2021-08-05 Amirreza Bagheri , Péter Hegedűs

Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study

Despite various approaches being employed to detect vulnerabilities, the number of reported vulnerabilities shows an upward trend over the years. This suggests the problems are not caught before the code is released, which could be caused…

Cryptography and Security · Computer Science 2025-02-14 Karl Tamberg , Hayretdin Bahsi

Generalization-Enhanced Code Vulnerability Detection via Multi-Task Instruction Fine-Tuning

Code Pre-trained Models (CodePTMs) based vulnerability detection have achieved promising results over recent years. However, these models struggle to generalize as they typically learn superficial mapping from source code to labels instead…

Cryptography and Security · Computer Science 2024-06-07 Xiaohu Du , Ming Wen , Jiahao Zhu , Zifan Xie , Bin Ji , Huijun Liu , Xuanhua Shi , Hai Jin

What do pre-trained code models know about code?

Pre-trained models of code built on the transformer architecture have performed well on software engineering (SE) tasks such as predictive code generation, code summarization, among others. However, whether the vector representations from…

Software Engineering · Computer Science 2021-08-26 Anjan Karmakar , Romain Robbes

Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities

While automated vulnerability detection techniques have made promising progress in detecting security vulnerabilities, their scalability and applicability remain challenging. The remarkable performance of Large Language Models (LLMs), such…

Cryptography and Security · Computer Science 2024-10-24 Avishree Khare , Saikat Dutta , Ziyang Li , Alaia Solko-Breslin , Rajeev Alur , Mayur Naik

Risk Assessment Framework for Code LLMs via Leveraging Internal States

The pre-training paradigm plays a key role in the success of Large Language Models (LLMs), which have been recognized as one of the most significant advancements of AI recently. Building on these breakthroughs, code LLMs with advanced…

Software Engineering · Computer Science 2025-04-22 Yuheng Huang , Lei Ma , Keizaburo Nishikino , Takumi Akazaki

Outside the Comfort Zone: Analysing LLM Capabilities in Software Vulnerability Detection

The significant increase in software production driven by automation and faster development lifecycles has resulted in a corresponding surge in software vulnerabilities. In parallel, the evolving landscape of software vulnerability…

Cryptography and Security · Computer Science 2024-08-30 Yuejun Guo , Constantinos Patsakis , Qiang Hu , Qiang Tang , Fran Casino