Related papers: Detecting Code Clones with Graph Neural Networkand…
The increasing complexity of modern software systems has led to a rise in vulnerabilities that malicious actors can exploit. Traditional methods of vulnerability detection, such as static and dynamic analysis, have limitations in…
Detecting similar code fragments, usually referred to as code clones, is an important task. In particular, code clone detection can have significant uses in the context of vulnerability discovery, refactoring and plagiarism detection.…
Software comprehension can be extremely time-consuming due to the ever-growing size of codebases. Consequently, there is an increasing need to accelerate the code comprehension process to facilitate maintenance and reduce associated costs.…
Code Language Models (codeLMs) and Graph Neural Networks (GNNs) are widely used in code vulnerability detection. However, GNNs often rely on aggregating information from adjacent nodes, limiting structural information propagation across…
Despite the predominance of contextualized embeddings in NLP, approaches to detect semantic change relying on these embeddings and clustering methods underperform simpler counterparts based on static word embeddings. This stems from the…
Software development is a repetitive task, as developers usually reuse or get inspiration from existing implementations. Code search, which refers to the retrieval of relevant code snippets from a codebase according to the developer's…
One of the most significant challenges in the field of software code auditing is the presence of vulnerabilities in software source code. Every year, more and more software flaws are discovered, either internally in proprietary code or…
Code search is a widely used technique by developers during software development. It provides semantically similar implementations from a large code corpus to developers based on their queries. Existing techniques leverage deep learning…
Code smell is a great challenge in software refactoring, which indicates latent design or implementation flaws that may degrade the software maintainability and evolution. Over the past decades, a variety of refactoring approaches have been…
Summarizing source code into natural language descriptions (code summarization) helps developers better understand program functionality and reduce the burden of software maintenance. Abstract Syntax Trees (ASTs), as opposed to source code,…
This paper presents a new approach for code similarity on High Level programs. Our technique is based on Fast Dynamic Time Warping, that builds a warp path or points relation with local restrictions. The source code is represented into Time…
As code search permeates most activities in software development,code-to-code search has emerged to support using code as a query and retrieving similar code in the search results. Applications include duplicate code detection for…
Abstract syntax trees (ASTs) play a crucial role in source code representation. However, due to the large number of nodes in an AST and the typically deep AST hierarchy, it is challenging to learn the hierarchical structure of an AST…
Protecting sensitive program content is a critical issue in various situations, ranging from legitimate use cases to unethical contexts. Obfuscation is one of the most used techniques to ensure such protection. Consequently, attackers must…
Most machine learning and data analytics applications, including performance engineering in software systems, require a large number of annotations and labelled data, which might not be available in advance. Acquiring annotations often…
Software clones have been an active area of research for the past two decades. However, although numerous clone detection tools are now available, only a small fraction of the literature has focused on tool evaluation, and this is in fact…
In this paper, we focus on exploring effective methods for faster and accurate semantic segmentation. A common practice to improve the performance is to attain high-resolution feature maps with strong semantic representation. Two strategies…
Understanding the functional (dis)-similarity of source code is significant for code modeling tasks such as software vulnerability and code clone detection. We present DISCO(DIS-similarity of COde), a novel self-supervised model focusing on…
We investigate node representation learning on text-attributed graphs (TAGs), where nodes are associated with text information. Although recent studies on graph neural networks (GNNs) and pretrained language models (PLMs) have exhibited…
Structure encoding has proven to be the key feature to distinguishing links in a graph. However, Structure encoding in the temporal graph keeps changing as the graph evolves, repeatedly computing such features can be time-consuming due to…