Related papers: Detecting Code Clones with Graph Neural Networkand…

A Graph-Based Semantics Workbench for Concurrent Asynchronous Programs

A number of novel programming languages and libraries have been proposed that offer simpler-to-use models of concurrency than threads. It is challenging, however, to devise execution models that successfully realise their abstractions…

Software Engineering · Computer Science 2016-03-24 Claudio Corrodi , Alexander Heußner , Christopher M. Poskitt

Towards Human-interpretable Explanation in Code Clone Detection using LLM-based Post Hoc Explainer

Recent studies highlight various machine learning (ML)-based techniques for code clone detection, which can be integrated into developer tools such as static code analysis. With the advancements brought by ML in code understanding, ML-based…

Software Engineering · Computer Science 2025-09-30 Teeradaj Racharak , Chaiyong Ragkhitwetsagul , Chayanee Junplong , Akara Supratak

Capturing Fine-grained Semantics in Contrastive Graph Representation Learning

Graph contrastive learning defines a contrastive task to pull similar instances close and push dissimilar instances away. It learns discriminative node embeddings without supervised labels, which has aroused increasing attention in the past…

Machine Learning · Computer Science 2023-04-25 Lin Shu , Chuan Chen , Zibin Zheng

Development and Benchmarking of Multilingual Code Clone Detector

The diversity of programming languages is growing, making the language extensibility of code clone detectors crucial. However, this is challenging for most existing clone detection detectors because the source code handler needs…

Software Engineering · Computer Science 2024-09-18 Wenqing Zhu , Norihiro Yoshida , Toshihiro Kamiya , Eunjong Choi , Hiroaki Takada

Graph Conditioned Sparse-Attention for Improved Source Code Understanding

Transformer architectures have been successfully used in learning source code representations. The fusion between a graph representation like Abstract Syntax Tree (AST) and a source code sequence makes the use of current approaches…

Machine Learning · Computer Science 2021-12-06 Junyan Cheng , Iordanis Fostiropoulos , Barry Boehm

AST-T5: Structure-Aware Pretraining for Code Generation and Understanding

Large language models (LLMs) have made significant advancements in code-related tasks, yet many LLMs treat code as simple sequences, neglecting its structured nature. We introduce AST-T5, a novel pretraining paradigm that leverages the…

Software Engineering · Computer Science 2024-06-25 Linyuan Gong , Mostafa Elhoushi , Alvin Cheung

The pragmatics of clone detection and elimination

The occurrence of similar code, or `code clones', can make program code difficult to read, modify and maintain. This paper describes industrial case studies of clone detection and elimination using a refactoring and clone detection tool. We…

Software Engineering · Computer Science 2017-04-03 Simon Thompson , Huiqing Li , Andreas Schumacher

TRACED: Execution-aware Pre-training for Source Code

Most existing pre-trained language models for source code focus on learning the static code text, typically augmented with static code structures (abstract syntax tree, dependency graphs, etc.). However, program semantics will not be fully…

Software Engineering · Computer Science 2023-06-14 Yangruibo Ding , Ben Steenhoek , Kexin Pei , Gail Kaiser , Wei Le , Baishakhi Ray

Software Vulnerability Detection via Deep Learning over Disaggregated Code Graph Representation

Identifying vulnerable code is a precautionary measure to counter software security breaches. Tedious expert effort has been spent to build static analyzers, yet insecure patterns are barely fully enumerated. This work explores a deep…

Artificial Intelligence · Computer Science 2021-09-09 Yufan Zhuang , Sahil Suneja , Veronika Thost , Giacomo Domeniconi , Alessandro Morari , Jim Laredo

MSCCD: Grammar Pluggable Clone Detection Based on ANTLR Parser Generation

For various reasons, programming languages continue to multiply and evolve. It has become necessary to have a multilingual clone detection tool that can easily expand supported programming languages and detect various code clones is needed.…

Software Engineering · Computer Science 2022-04-07 Wenqing Zhu , Norihiro Yoshida , Toshihiro Kamiya , Eunjong Choi , Hiroaki Takada

Logical Segmentation of Source Code

Many software analysis methods have come to rely on machine learning approaches. Code segmentation - the process of decomposing source code into meaningful blocks - can augment these methods by featurizing code, reducing noise, and limiting…

Software Engineering · Computer Science 2019-07-23 Jacob Dormuth , Ben Gelman , Jessica Moore , David Slater

Custom-Tailored Clone Detection for IEC 61131-3 Programming Languages

Automated production systems (aPS) are highly customized systems that consist of hardware and software. Such aPS are controlled by a programmable logic controller (PLC), often in accordance with the IEC 61131-3 standard that divides system…

Software Engineering · Computer Science 2021-08-24 Kamil Rosiak , Alexander Schlie , Lukas Linsbauer , Birgit Vogel-Heuser , Ina Schaefer

More Separable and Easier to Segment: A Cluster Alignment Method for Cross-Domain Semantic Segmentation

Feature alignment between domains is one of the mainstream methods for Unsupervised Domain Adaptation (UDA) semantic segmentation. Existing feature alignment methods for semantic segmentation learn domain-invariant features by adversarial…

Computer Vision and Pattern Recognition · Computer Science 2021-05-10 Shuang Wang , Dong Zhao , Yi Li , Chi Zhang , Yuwei Guo , Qi Zang , Biao Hou , Licheng Jiao

AST-Based Deep Learning for Detecting Malicious PowerShell

With the celebrated success of deep learning, some attempts to develop effective methods for detecting malicious PowerShell programs employ neural nets in a traditional natural language processing setup while others employ convolutional…

Software Engineering · Computer Science 2018-10-23 Gili Rusak , Abdullah Al-Dujaili , Una-May O'Reilly

Know Your Neighborhood: General and Zero-Shot Capable Binary Function Search Powered by Call Graphlets

Binary code similarity detection is an important problem with applications in areas such as malware analysis, vulnerability research and license violation detection. This paper proposes a novel graph neural network architecture combined…

Cryptography and Security · Computer Science 2024-11-13 Joshua Collyer , Tim Watson , Iain Phillips

Leveraging Code Clones and Natural Language Processing for Log Statement Prediction

Software developers embed logging statements inside the source code as an imperative duty in modern software development as log files are necessary for tracking down runtime system issues and troubleshooting system management tasks. Prior…

Software Engineering · Computer Science 2021-09-10 Sina Gholamian

What You See is What it Means! Semantic Representation Learning of Code based on Visualization and Transfer Learning

Recent successes in training word embeddings for NLP tasks have encouraged a wave of research on representation learning for source code, which builds on similar NLP methods. The overall objective is then to produce code embeddings that…

Software Engineering · Computer Science 2020-02-10 Patrick Keller , Laura Plein , Tegawendé F. Bissyandé , Jacques Klein , Yves Le Traon

LVMapper: A Large-variance Clone Detector Using Sequencing Alignment Approach

To detect large-variance code clones (i.e. clones with relatively more differences) in large-scale code repositories is difficult because most current tools can only detect almost identical or very similar clones. It will make promotion and…

Software Engineering · Computer Science 2019-09-11 Ming Wu , Pengcheng Wang , Kangqi Yin , Haoyu Cheng , Yun Xu , Chanchal K. Roy

Efficiently Clustering Very Large Attributed Graphs

Attributed graphs model real networks by enriching their nodes with attributes accounting for properties. Several techniques have been proposed for partitioning these graphs into clusters that are homogeneous with respect to both semantic…

Social and Information Networks · Computer Science 2017-08-29 Alessandro Baroni , Alessio Conte , Maurizio Patrignani , Salvatore Ruggieri

Survey of Code Search Based on Deep Learning

Code writing is repetitive and predictable, inspiring us to develop various code intelligence techniques. This survey focuses on code search, that is, to retrieve code that matches a given query by effectively capturing the semantic…

Software Engineering · Computer Science 2023-12-14 Yutao Xie , Jiayi Lin , Hande Dong , Lei Zhang , Zhonghai Wu