Related papers: Detecting Code Clones with Graph Neural Networkand…

Advanced Detection of Source Code Clones via an Ensemble of Unsupervised Similarity Measures

The capability of accurately determining code similarity is crucial in many tasks related to software development. For example, it might be essential to identify code duplicates for performing software maintenance. This research introduces…

Software Engineering · Computer Science 2025-04-25 Jorge Martinez-Gil

Datalog-based Scalable Semantic Diffing of Concurrent Programs

When an evolving program is modified to address issues related to thread synchronization, there is a need to confirm the change is correct, i.e., it does not introduce unexpected behavior. However, manually comparing two programs to…

Software Engineering · Computer Science 2018-07-17 Chungha Sung , Shuvendu Lahiri , Constantin Enea , Chao Wang

AST-Probe: Recovering abstract syntax trees from hidden representations of pre-trained language models

The objective of pre-trained language models is to learn contextual representations of textual data. Pre-trained language models have become mainstream in natural language processing and code modeling. Using probes, a technique to study the…

Computation and Language · Computer Science 2022-09-13 José Antonio Hernández López , Martin Weyssow , Jesús Sánchez Cuadrado , Houari Sahraoui

Discovering Software Parallelization Points Using Deep Neural Networks

This study proposes a deep learning-based approach for discovering loops in programming code according to their potential for parallelization. Two genetic algorithm-based code generators were developed to produce two distinct types of code:…

Machine Learning · Computer Science 2025-10-03 Izavan dos S. Correia , Henrique C. T. Santos , Tiago A. E. Ferreira

Neural Network Graph Similarity Computation Based on Graph Fusion

Graph similarity learning, crucial for tasks such as graph classification and similarity search, focuses on measuring the similarity between two graph-structured entities. The core challenge in this field is effectively managing the…

Information Retrieval · Computer Science 2025-02-26 Zenghui Chang , Yiqiao Zhang , Hong Cai Chen

GraphSkill: Documentation-Guided Hierarchical Retrieval-Augmented Coding for Complex Graph Reasoning

The growing demand for automated graph algorithm reasoning has attracted increasing attention in the large language model (LLM) community. Recent LLM-based graph reasoning methods typically decouple task descriptions from graph data,…

Software Engineering · Computer Science 2026-03-10 Fali Wang , Chenglin Weng , Xianren Zhang , Siyuan Hong , Hui Liu , Suhang Wang

The Struggles of LLMs in Cross-lingual Code Clone Detection

With the involvement of multiple programming languages in modern software development, cross-lingual code clone detection has gained traction within the software engineering community. Numerous studies have explored this topic, proposing…

Software Engineering · Computer Science 2025-05-07 Micheline Bénédicte Moumoula , Abdoul Kader Kabore , Jacques Klein , Tegawendé Bissyande

CONCORD: Towards a DSL for Configurable Graph Code Representation

Deep learning is widely used to uncover hidden patterns in large code corpora. To achieve this, constructing a format that captures the relevant characteristics and features of source code is essential. Graph-based representations have…

Software Engineering · Computer Science 2024-02-01 Mootez Saad , Tushar Sharma

Semantic Word Clusters Using Signed Normalized Graph Cuts

Vector space representations of words capture many aspects of word similarity, but such methods tend to make vector spaces in which antonyms (as well as synonyms) are close to each other. We present a new signed spectral normalized graph…

Computation and Language · Computer Science 2016-01-21 João Sedoc , Jean Gallier , Lyle Ungar , Dean Foster

Signature Limits: An Entire Map of Clone Features and their Discovery in Nearly Linear Time

We address the problem of creating entire and complete maps of software code clones (copy features in data) in a corpus of binary artifacts of unknown provenance. We report on a practical methodology, which employs enhanced suffix data…

Cryptography and Security · Computer Science 2014-07-11 William Casey , Aaron Shelmire

Automatic Clone Recommendation for Refactoring Based on the Present and the Past

When many clones are detected in software programs, not all clones are equally important to developers. To help developers refactor code and improve software quality, various tools were built to recommend clone-removal refactorings based on…

Software Engineering · Computer Science 2018-07-31 Ruru Yue , Zhe Gao , Na Meng , Yingfei Xiong , Xiaoyin Wang , J. David Morgenthaler

Multimodal Representation for Neural Code Search

Semantic code search is about finding semantically relevant code snippets for a given natural language query. In the state-of-the-art approaches, the semantic similarity between code and query is quantified as the distance of their…

Software Engineering · Computer Science 2022-01-14 Jian Gu , Zimin Chen , Martin Monperrus

ZC3: Zero-Shot Cross-Language Code Clone Detection

Developers introduce code clones to improve programming productivity. Many existing studies have achieved impressive performance in monolingual code clone detection. However, during software development, more and more developers write…

Software Engineering · Computer Science 2023-09-08 Jia Li , Chongyang Tao , Zhi Jin , Fang Liu , Jia Li , Ge Li

Adversarial Attacks on Code Models with Discriminative Graph Patterns

Pre-trained language models of code are now widely used in various software engineering tasks such as code generation, code completion, vulnerability detection, etc. This, in turn, poses security and reliability risks to these models. One…

Software Engineering · Computer Science 2024-11-01 Thanh-Dat Nguyen , Yang Zhou , Xuan Bach D. Le , Patanamon Thongtanunam , David Lo

Duplicate Detection as a Service

Completeness of a knowledge graph is an important quality dimension and factor on how well an application that makes use of it performs. Completeness can be improved by performing knowledge enrichment. Duplicate detection aims to find…

Databases · Computer Science 2022-07-21 Juliette Opdenplatz , Umutcan Şimşek , Dieter Fensel

Automated Identification of Logical Errors in Programs: Advancing Scalable Analysis of Student Misconceptions

In Computer Science (CS) education, understanding factors contributing to students' programming difficulties is crucial for effective learning support. By identifying specific issues students face, educators can provide targeted assistance…

Machine Learning · Computer Science 2026-04-02 Muntasir Hoq , Ananya Rao , Reisha Jaishankar , Krish Piryani , Nithya Janapati , Jessica Vandenberg , Bradford Mott , Narges Norouzi , James Lester , Bita Akram

Functional Consistency of LLM Code Embeddings: A Self-Evolving Data Synthesis Framework for Benchmarking

Embedding models have demonstrated strong performance in tasks like clustering, retrieval, and feature extraction while offering computational advantages over generative models and cross-encoders. Benchmarks such as MTEB have shown that…

Software Engineering · Computer Science 2025-08-28 Zhuohao Li , Wenqing Chen , Jianxing Yu , Zhichao Lu

Automatic Repair and Type Binding of Undeclared Variables using Neural Networks

Deep learning had been used in program analysis for the prediction of hidden software defects using software defect datasets, security vulnerabilities using generative adversarial networks as well as identifying syntax errors by learning a…

Software Engineering · Computer Science 2019-07-16 Venkatesh Theru Mohan , Ali Jannesari

Synchromesh: Reliable code generation from pre-trained language models

Large pre-trained language models have been used to generate code,providing a flexible interface for synthesizing programs from natural language specifications. However, they often violate syntactic and semantic rules of their output…

Machine Learning · Computer Science 2022-01-28 Gabriel Poesia , Oleksandr Polozov , Vu Le , Ashish Tiwari , Gustavo Soares , Christopher Meek , Sumit Gulwani

Selecting and Combining Large Language Models for Scalable Code Clone Detection

Source code clones pose risks ranging from intellectual property violations to unintended vulnerabilities. Effective and efficient scalable clone detection, especially for diverged clones, remains challenging. Large language models (LLMs)…

Software Engineering · Computer Science 2025-10-20 Muslim Chochlov , Gul Aftab Ahmed , James Vincent Patten , Yuanhua Han , Guoxian Lu , David Gregg , Jim Buckley