Related papers: Detecting Code Clones with Graph Neural Networkand…

Enhancing Source Code Representations for Deep Learning with Static Analysis

Deep learning techniques applied to program analysis tasks such as code classification, summarization, and bug detection have seen widespread interest. Traditional approaches, however, treat programming source code as natural language text,…

Software Engineering · Computer Science 2024-02-16 Xueting Guan , Christoph Treude

funcGNN: A Graph Neural Network Approach to Program Similarity

Program similarity is a fundamental concept, central to the solution of software engineering tasks such as software plagiarism, clone identification, code refactoring and code search. Accurate similarity estimation between programs requires…

Machine Learning · Computer Science 2020-07-31 Aravind Nair , Avijit Roy , Karl Meinke

Exploring the Effectiveness of Abstract Syntax Tree Patterns for Algorithm Recognition

The automated recognition of algorithm implementations can support many software maintenance and re-engineering activities by providing knowledge about the concerns present in the code base. Moreover, recognizing inefficient algorithms like…

Software Engineering · Computer Science 2026-05-08 Denis Neumüller , Florian Sihler , Raphael Straub , Matthias Tichy

Code Clone Matching: A Practical and Effective Approach to Find Code Snippets

Finding the same or similar code snippets in source code is one of fundamental activities in software maintenance. Text-based pattern matching tools such as grep is frequently used for such purpose, but making proper queries for the…

Software Engineering · Computer Science 2020-03-13 Katsuro Inoue , Yuya Miyamoto , Daniel M. German , Takashi Ishio

Source Code is a Graph, Not a Sequence: A Cross-Lingual Perspective on Code Clone Detection

Source code clone detection is the task of finding code fragments that have the same or similar functionality, but may differ in syntax or structure. This task is important for software maintenance, reuse, and quality assurance (Roy et al.…

Computation and Language · Computer Science 2023-12-29 Mohammed Ataaur Rahaman , Julia Ive

On the Impact of Multiple Source Code Representations on Software Engineering Tasks -- An Empirical Study

Efficiently representing source code is crucial for various software engineering tasks such as code classification and clone detection. Existing approaches primarily use Abstract Syntax Tree (AST), and only a few focus on semantic graphs…

Software Engineering · Computer Science 2023-12-27 Karthik Chandra Swarna , Noble Saji Mathews , Dheeraj Vagavolu , Sridhar Chimalakonda

AI-Driven Code Refactoring: Using Graph Neural Networks to Enhance Software Maintainability

This study explores Graph Neural Networks (GNNs) as a transformative tool for code refactoring, using abstract syntax trees (ASTs) to boost software maintainability. It analyzes a dataset of 2 million snippets from CodeSearchNet and a…

Artificial Intelligence · Computer Science 2025-04-15 Gopichand Bandarupalli

Code Clone Detection based on Event Embedding and Event Dependency

The code clone detection method based on semantic similarity has important value in software engineering tasks (e.g., software evolution, software reuse). Traditional code clone detection technologies pay more attention to the similarity of…

Software Engineering · Computer Science 2021-11-30 Cheng Huang , Hui Zhou , Chunyang Ye , Bingzhuo Li

Learning Program Semantics with Code Representations: An Empirical Study

Program semantics learning is the core and fundamental for various code intelligent tasks e.g., vulnerability detection, clone detection. A considerable amount of existing works propose diverse approaches to learn the program semantics for…

Software Engineering · Computer Science 2022-03-23 Jing Kai Siow , Shangqing Liu , Xiaofei Xie , Guozhu Meng , Yang Liu

Improved Code Summarization via a Graph Neural Network

Automatic source code summarization is the task of generating natural language descriptions for source code. Automatic code summarization is a rapidly expanding research area, especially as the community has taken greater advantage of…

Software Engineering · Computer Science 2020-04-08 Alexander LeClair , Sakib Haque , Lingfei Wu , Collin McMillan

Abstract Syntax Tree for Programming Language Understanding and Representation: How Far Are We?

Programming language understanding and representation (a.k.a code representation learning) has always been a hot and challenging task in software engineering. It aims to apply deep learning techniques to produce numerical representations of…

Software Engineering · Computer Science 2023-12-04 Weisong Sun , Chunrong Fang , Yun Miao , Yudu You , Mengzhe Yuan , Yuchen Chen , Quanjun Zhang , An Guo , Xiang Chen , Yang Liu , Zhenyu Chen

Automated Static Warning Identification via Path-based Semantic Representation

Despite their ability to aid developers in detecting potential defects early in the software development life cycle, static analysis tools often suffer from precision issues (i.e., high false positive rates of reported alarms). To improve…

Software Engineering · Computer Science 2024-01-22 Yuwei Zhang , Ying Xing , Ge Li , Zhi Jin

Clone-Seeker: Effective Code Clone Search Using Annotations

Source code search plays an important role in software development, e.g. for exploratory development or opportunistic reuse of existing code from a code base. Often, exploration of different implementations with the same functionality is…

Software Engineering · Computer Science 2021-06-08 Muhammad Hammad , Önder Babur , Hamid Abdul Basit , Mark van den Brand

I Know Who Clones Your Code: Interpretable Smart Contract Similarity Detection

Widespread reuse of open-source code in smart contract development boosts programming efficiency but significantly amplifies bug propagation across contracts, while dedicated methods for detecting similar smart contract functions remain…

Software Engineering · Computer Science 2025-09-12 Zhenguang Liu , Lixun Ma , Zhongzheng Mu , Chengkun Wei , Xiaojun Xu , Yingying Jiao , Kui Ren

Detecting Semantic Clones of Unseen Functionality

Semantic code clone detection is the task of detecting whether two snippets of code implement the same functionality (e.g., Sort Array). Recently, many neural models achieved near-perfect performance on this task. These models seek to make…

Software Engineering · Computer Science 2025-12-02 Konstantinos Kitsios , Francesco Sovrano , Earl T. Barr , Alberto Bacchelli

Heterogeneous Directed Hypergraph Neural Network over abstract syntax tree (AST) for Code Classification

Code classification is a difficult issue in program understanding and automatic coding. Due to the elusive syntax and complicated semantics in programs, most existing studies use techniques based on abstract syntax tree (AST) and graph…

Software Engineering · Computer Science 2025-09-25 Guang Yang , Tiancheng Jin , Liang Dou

A Machine Learning Based Framework for Code Clone Validation

A code clone is a pair of code fragments, within or between software systems that are similar. Since code clones often negatively impact the maintainability of a software system, several code clone detection techniques and tools have been…

Software Engineering · Computer Science 2020-05-05 Golam Mostaeen , Banani Roy , Chanchal Roy , Kevin Schneider , Jeffrey Svajlenko

SLACC: Simion-based Language Agnostic Code Clones

Successful cross-language clone detection could enable researchers and developers to create robust language migration tools, facilitate learning additional programming languages once one is mastered, and promote reuse of code snippets over…

Software Engineering · Computer Science 2020-02-11 George Mathew , Chris Parnin , Kathryn T Stolee

Source Code Comments: Overlooked in the Realm of Code Clone Detection

Reusing code can produce duplicate or near-duplicate code clones in code repositories. Current code clone detection techniques, like Program Dependence Graphs, rely on code structure and their dependencies to detect clones. These techniques…

Software Engineering · Computer Science 2020-06-26 Sandeep Kaur Kuttal , Akash Ghosh

CC2Vec: Combining Typed Tokens with Contrastive Learning for Effective Code Clone Detection

With the development of the open source community, the code is often copied, spread, and evolved in multiple software systems, which brings uncertainty and risk to the software system (e.g., bug propagation and copyright infringement).…

Software Engineering · Computer Science 2024-05-02 Shihan Dou , Yueming Wu , Haoxiang Jia , Yuhao Zhou , Yan Liu , Yang Liu