English
Related papers

Related papers: Detecting Code Clones with Graph Neural Networkand…

200 papers

The capability of accurately determining code similarity is crucial in many tasks related to software development. For example, it might be essential to identify code duplicates for performing software maintenance. This research introduces…

Software Engineering · Computer Science 2025-04-25 Jorge Martinez-Gil

When an evolving program is modified to address issues related to thread synchronization, there is a need to confirm the change is correct, i.e., it does not introduce unexpected behavior. However, manually comparing two programs to…

Software Engineering · Computer Science 2018-07-17 Chungha Sung , Shuvendu Lahiri , Constantin Enea , Chao Wang

The objective of pre-trained language models is to learn contextual representations of textual data. Pre-trained language models have become mainstream in natural language processing and code modeling. Using probes, a technique to study the…

Computation and Language · Computer Science 2022-09-13 José Antonio Hernández López , Martin Weyssow , Jesús Sánchez Cuadrado , Houari Sahraoui

This study proposes a deep learning-based approach for discovering loops in programming code according to their potential for parallelization. Two genetic algorithm-based code generators were developed to produce two distinct types of code:…

Machine Learning · Computer Science 2025-10-03 Izavan dos S. Correia , Henrique C. T. Santos , Tiago A. E. Ferreira

Graph similarity learning, crucial for tasks such as graph classification and similarity search, focuses on measuring the similarity between two graph-structured entities. The core challenge in this field is effectively managing the…

Information Retrieval · Computer Science 2025-02-26 Zenghui Chang , Yiqiao Zhang , Hong Cai Chen

The growing demand for automated graph algorithm reasoning has attracted increasing attention in the large language model (LLM) community. Recent LLM-based graph reasoning methods typically decouple task descriptions from graph data,…

Software Engineering · Computer Science 2026-03-10 Fali Wang , Chenglin Weng , Xianren Zhang , Siyuan Hong , Hui Liu , Suhang Wang

With the involvement of multiple programming languages in modern software development, cross-lingual code clone detection has gained traction within the software engineering community. Numerous studies have explored this topic, proposing…

Software Engineering · Computer Science 2025-05-07 Micheline Bénédicte Moumoula , Abdoul Kader Kabore , Jacques Klein , Tegawendé Bissyande

Deep learning is widely used to uncover hidden patterns in large code corpora. To achieve this, constructing a format that captures the relevant characteristics and features of source code is essential. Graph-based representations have…

Software Engineering · Computer Science 2024-02-01 Mootez Saad , Tushar Sharma

Vector space representations of words capture many aspects of word similarity, but such methods tend to make vector spaces in which antonyms (as well as synonyms) are close to each other. We present a new signed spectral normalized graph…

Computation and Language · Computer Science 2016-01-21 João Sedoc , Jean Gallier , Lyle Ungar , Dean Foster

We address the problem of creating entire and complete maps of software code clones (copy features in data) in a corpus of binary artifacts of unknown provenance. We report on a practical methodology, which employs enhanced suffix data…

Cryptography and Security · Computer Science 2014-07-11 William Casey , Aaron Shelmire

When many clones are detected in software programs, not all clones are equally important to developers. To help developers refactor code and improve software quality, various tools were built to recommend clone-removal refactorings based on…

Software Engineering · Computer Science 2018-07-31 Ruru Yue , Zhe Gao , Na Meng , Yingfei Xiong , Xiaoyin Wang , J. David Morgenthaler

Semantic code search is about finding semantically relevant code snippets for a given natural language query. In the state-of-the-art approaches, the semantic similarity between code and query is quantified as the distance of their…

Software Engineering · Computer Science 2022-01-14 Jian Gu , Zimin Chen , Martin Monperrus

Developers introduce code clones to improve programming productivity. Many existing studies have achieved impressive performance in monolingual code clone detection. However, during software development, more and more developers write…

Software Engineering · Computer Science 2023-09-08 Jia Li , Chongyang Tao , Zhi Jin , Fang Liu , Jia Li , Ge Li

Pre-trained language models of code are now widely used in various software engineering tasks such as code generation, code completion, vulnerability detection, etc. This, in turn, poses security and reliability risks to these models. One…

Software Engineering · Computer Science 2024-11-01 Thanh-Dat Nguyen , Yang Zhou , Xuan Bach D. Le , Patanamon Thongtanunam , David Lo

Completeness of a knowledge graph is an important quality dimension and factor on how well an application that makes use of it performs. Completeness can be improved by performing knowledge enrichment. Duplicate detection aims to find…

Databases · Computer Science 2022-07-21 Juliette Opdenplatz , Umutcan Şimşek , Dieter Fensel

In Computer Science (CS) education, understanding factors contributing to students' programming difficulties is crucial for effective learning support. By identifying specific issues students face, educators can provide targeted assistance…

Embedding models have demonstrated strong performance in tasks like clustering, retrieval, and feature extraction while offering computational advantages over generative models and cross-encoders. Benchmarks such as MTEB have shown that…

Software Engineering · Computer Science 2025-08-28 Zhuohao Li , Wenqing Chen , Jianxing Yu , Zhichao Lu

Deep learning had been used in program analysis for the prediction of hidden software defects using software defect datasets, security vulnerabilities using generative adversarial networks as well as identifying syntax errors by learning a…

Software Engineering · Computer Science 2019-07-16 Venkatesh Theru Mohan , Ali Jannesari

Large pre-trained language models have been used to generate code,providing a flexible interface for synthesizing programs from natural language specifications. However, they often violate syntactic and semantic rules of their output…

Machine Learning · Computer Science 2022-01-28 Gabriel Poesia , Oleksandr Polozov , Vu Le , Ashish Tiwari , Gustavo Soares , Christopher Meek , Sumit Gulwani

Source code clones pose risks ranging from intellectual property violations to unintended vulnerabilities. Effective and efficient scalable clone detection, especially for diverged clones, remains challenging. Large language models (LLMs)…

Software Engineering · Computer Science 2025-10-20 Muslim Chochlov , Gul Aftab Ahmed , James Vincent Patten , Yuanhua Han , Guoxian Lu , David Gregg , Jim Buckley