Related papers: Learning Program Semantics with Code Representatio…

A Comparison of Code Embeddings and Beyond

Program representation learning is a fundamental task in software engineering applications. With the availability of "big code" and the development of deep learning techniques, various program representation learning models have been…

Software Engineering · Computer Science 2021-09-17 Siqi Han , DongXia Wang , Wanting Li , Xuesong Lu

A Survey of Source Code Representations for Machine Learning-Based Cybersecurity Tasks

Machine learning techniques for cybersecurity-related software engineering tasks are becoming increasingly popular. The representation of source code is a key portion of the technique that can impact the way the model is able to learn the…

Machine Learning · Computer Science 2025-04-10 Beatrice Casey , Joanna C. S. Santos , George Perry

On the Impact of Multiple Source Code Representations on Software Engineering Tasks -- An Empirical Study

Efficiently representing source code is crucial for various software engineering tasks such as code classification and clone detection. Existing approaches primarily use Abstract Syntax Tree (AST), and only a few focus on semantic graphs…

Software Engineering · Computer Science 2023-12-27 Karthik Chandra Swarna , Noble Saji Mathews , Dheeraj Vagavolu , Sridhar Chimalakonda

Semantic Code Graph -- an information model to facilitate software comprehension

Software comprehension can be extremely time-consuming due to the ever-growing size of codebases. Consequently, there is an increasing need to accelerate the code comprehension process to facilitate maintenance and reduce associated costs.…

Software Engineering · Computer Science 2024-01-15 Krzysztof Borowski , Bartosz Baliś , Tomasz Orzechowski

Universal Representation for Code

Learning from source code usually requires a large amount of labeled data. Despite the possible scarcity of labeled data, the trained model is highly task-specific and lacks transferability to different tasks. In this work, we present…

Machine Learning · Computer Science 2021-03-05 Linfeng Liu , Hoan Nguyen , George Karypis , Srinivasan Sengamedu

Evaluation of Contrastive Learning with Various Code Representations for Code Clone Detection

Code clones are pairs of code snippets that implement similar functionality. Clone detection is a fundamental branch of automatic source code comprehension, having many applications in refactoring recommendation, plagiarism detection, and…

Software Engineering · Computer Science 2022-06-20 Maksim Zubkov , Egor Spirin , Egor Bogomolov , Timofey Bryksin

Neural Code Comprehension: A Learnable Representation of Code Semantics

With the recent success of embeddings in natural language processing, research has been conducted into applying similar methods to code analysis. Most works attempt to process the code directly or use a syntactic tree representation,…

Machine Learning · Computer Science 2018-11-30 Tal Ben-Nun , Alice Shoshana Jakobovits , Torsten Hoefler

Graph-Based Decoding for Task Oriented Semantic Parsing

The dominant paradigm for semantic parsing in recent years is to formulate parsing as a sequence-to-sequence task, generating predictions with auto-regressive sequence decoders. In this work, we explore an alternative paradigm. We formulate…

Computation and Language · Computer Science 2023-03-24 Jeremy R. Cole , Nanjiang Jiang , Panupong Pasupat , Luheng He , Peter Shaw

TreeCaps: Tree-Structured Capsule Networks for Program Source Code Processing

Program comprehension is a fundamental task in software development and maintenance processes. Software developers often need to understand a large amount of existing code before they can develop new features or fix bugs in existing…

Machine Learning · Computer Science 2019-10-29 Vinoj Jayasundara , Nghi Duy Quoc Bui , Lingxiao Jiang , David Lo

Abstract Syntax Tree for Programming Language Understanding and Representation: How Far Are We?

Programming language understanding and representation (a.k.a code representation learning) has always been a hot and challenging task in software engineering. It aims to apply deep learning techniques to produce numerical representations of…

Software Engineering · Computer Science 2023-12-04 Weisong Sun , Chunrong Fang , Yun Miao , Yudu You , Mengzhe Yuan , Yuchen Chen , Quanjun Zhang , An Guo , Xiang Chen , Yang Liu , Zhenyu Chen

What You See is What it Means! Semantic Representation Learning of Code based on Visualization and Transfer Learning

Recent successes in training word embeddings for NLP tasks have encouraged a wave of research on representation learning for source code, which builds on similar NLP methods. The overall objective is then to produce code embeddings that…

Software Engineering · Computer Science 2020-02-10 Patrick Keller , Laura Plein , Tegawendé F. Bissyandé , Jacques Klein , Yves Le Traon

Dynamic Neural Program Embedding for Program Repair

Neural program embeddings have shown much promise recently for a variety of program analysis tasks, including program synthesis, program repair, fault localization, etc. However, most existing program embeddings are based on syntactic…

Artificial Intelligence · Computer Science 2018-07-03 Ke Wang , Rishabh Singh , Zhendong Su

Comparison of Syntactic and Semantic Representations of Programs in Neural Embeddings

Neural approaches to program synthesis and understanding have proliferated widely in the last few years; at the same time graph based neural networks have become a promising new tool. This work aims to be the first empirical study comparing…

Software Engineering · Computer Science 2020-01-28 Austin P. Wright , Herbert Wiklicky

Learning to Represent Programs with Heterogeneous Graphs

Program source code contains complex structure information, which can be represented in structured data forms like trees or graphs. To acquire the structural information in source code, most existing researches use abstract syntax trees…

Software Engineering · Computer Science 2022-04-13 Kechi Zhang , Wenhan Wang , Huangzhao Zhang , Ge Li , Zhi Jin

A General Path-Based Representation for Predicting Program Properties

Predicting program properties such as names or expression types has a wide range of applications. It can ease the task of programming and increase programmer productivity. A major challenge when learning from programs is $\textit{how to…

Programming Languages · Computer Science 2018-04-24 Uri Alon , Meital Zilberstein , Omer Levy , Eran Yahav

Towards Semantic Clone Detection via Probabilistic Software Modeling

Semantic clones are program components with similar behavior, but different textual representation. Semantic similarity is hard to detect, and semantic clone detection is still an open issue. We present semantic clone detection via…

Software Engineering · Computer Science 2020-01-22 Hannes Thaller , Lukas Linsbauer , Alexander Egyed

Software Vulnerability Analysis Across Programming Language and Program Representation Landscapes: A Survey

Modern software systems are developed in diverse programming languages and often harbor critical vulnerabilities that attackers can exploit to compromise security. These vulnerabilities have been actively targeted in real-world attacks,…

Cryptography and Security · Computer Science 2025-03-27 Zhuoyun Qian , Fangtian Zhong , Qin Hu , Yili Jiang , Jiaqi Huang , Mengfei Ren , Jiguo Yu

TreeCaps: Tree-Based Capsule Networks for Source Code Processing

Recently program learning techniques have been proposed to process source code based on syntactical structures (e.g., Abstract Syntax Trees) and/or semantic information (e.g., Dependency Graphs). Although graphs may be better at capturing…

Software Engineering · Computer Science 2020-12-15 Nghi D. Q. Bui , Yijun Yu , Lingxiao Jiang

Emergent Representations of Program Semantics in Language Models Trained on Programs

We present evidence that language models (LMs) of code can learn to represent the formal semantics of programs, despite being trained only to perform next-token prediction. Specifically, we train a Transformer model on a synthetic corpus of…

Machine Learning · Computer Science 2024-08-06 Charles Jin , Martin Rinard

PERFOGRAPH: A Numerical Aware Program Graph Representation for Performance Optimization and Program Analysis

The remarkable growth and significant success of machine learning have expanded its applications into programming languages and program analysis. However, a key challenge in adopting the latest machine learning methods is the representation…

Programming Languages · Computer Science 2023-12-01 Ali TehraniJamsaz , Quazi Ishtiaque Mahmud , Le Chen , Nesreen K. Ahmed , Ali Jannesari