Related papers: Learning to Represent Programs with Heterogeneous …

Heterogeneous Directed Hypergraph Neural Network over abstract syntax tree (AST) for Code Classification

Code classification is a difficult issue in program understanding and automatic coding. Due to the elusive syntax and complicated semantics in programs, most existing studies use techniques based on abstract syntax tree (AST) and graph…

Software Engineering · Computer Science 2025-09-25 Guang Yang , Tiancheng Jin , Liang Dou

Abstract Syntax Tree for Programming Language Understanding and Representation: How Far Are We?

Programming language understanding and representation (a.k.a code representation learning) has always been a hot and challenging task in software engineering. It aims to apply deep learning techniques to produce numerical representations of…

Software Engineering · Computer Science 2023-12-04 Weisong Sun , Chunrong Fang , Yun Miao , Yudu You , Mengzhe Yuan , Yuchen Chen , Quanjun Zhang , An Guo , Xiang Chen , Yang Liu , Zhenyu Chen

Assessing the Effectiveness of Syntactic Structure to Learn Code Edit Representations

In recent times, it has been shown that one can use code as data to aid various applications such as automatic commit message generation, automatic generation of pull request descriptions and automatic program repair. Take for instance the…

Machine Learning · Computer Science 2021-06-14 Syed Arbaaz Qureshi , Sonu Mehta , Ranjita Bhagwan , Rahul Kumar

Precise Learning of Source Code Contextual Semantics via Hierarchical Dependence Structure and Graph Attention Networks

Deep learning is being used extensively in a variety of software engineering tasks, e.g., program classification and defect prediction. Although the technique eliminates the required process of feature engineering, the construction of…

Software Engineering · Computer Science 2021-11-24 Zhehao Zhao , Bo Yang , Ge Li , Huai Liu , Zhi Jin

Modular Tree Network for Source Code Representation Learning

Learning representation for source code is a foundation of many program analysis tasks. In recent years, neural networks have already shown success in this area, but most existing models did not make full use of the unique structural…

Software Engineering · Computer Science 2021-04-02 Wenhan Wang , Ge Li , Sijie Shen , Xin Xia , Zhi Jin

Program Classification Using Gated Graph Attention Neural Network for Online Programming Service

The online programing services, such as Github,TopCoder, and EduCoder, have promoted a lot of social interactions among the service users. However, the existing social interactions is rather limited and inefficient due to the rapid…

Artificial Intelligence · Computer Science 2019-03-12 Mingming Lu , Dingwu Tan , Naixue Xiong , Zailiang Chen , Haifeng Li

On the Impact of Multiple Source Code Representations on Software Engineering Tasks -- An Empirical Study

Efficiently representing source code is crucial for various software engineering tasks such as code classification and clone detection. Existing approaches primarily use Abstract Syntax Tree (AST), and only a few focus on semantic graphs…

Software Engineering · Computer Science 2023-12-27 Karthik Chandra Swarna , Noble Saji Mathews , Dheeraj Vagavolu , Sridhar Chimalakonda

Learning Program Representations with a Tree-Structured Transformer

Learning vector representations for programs is a critical step in applying deep learning techniques for program understanding tasks. Various neural network models are proposed to learn from tree-structured program representations, e.g.,…

Software Engineering · Computer Science 2023-01-10 Wenhan Wang , Kechi Zhang , Ge Li , Shangqing Liu , Anran Li , Zhi Jin , Yang Liu

Language-Agnostic Representation Learning of Source Code from Structure and Context

Source code (Context) and its parsed abstract syntax tree (AST; Structure) are two complementary representations of the same computer program. Traditionally, designers of machine learning models have relied predominantly either on Structure…

Machine Learning · Computer Science 2021-03-23 Daniel Zügner , Tobias Kirschstein , Michele Catasta , Jure Leskovec , Stephan Günnemann

Learning to Represent Programs with Graphs

Learning tasks on source code (i.e., formal languages) have been considered recently, but most work has tried to transfer natural language methods and does not capitalize on the unique opportunities offered by code's known syntax. For…

Machine Learning · Computer Science 2018-05-08 Miltiadis Allamanis , Marc Brockschmidt , Mahmoud Khademi

A Comparison of Code Embeddings and Beyond

Program representation learning is a fundamental task in software engineering applications. With the availability of "big code" and the development of deep learning techniques, various program representation learning models have been…

Software Engineering · Computer Science 2021-09-17 Siqi Han , DongXia Wang , Wanting Li , Xuesong Lu

Improved Code Summarization via a Graph Neural Network

Automatic source code summarization is the task of generating natural language descriptions for source code. Automatic code summarization is a rapidly expanding research area, especially as the community has taken greater advantage of…

Software Engineering · Computer Science 2020-04-08 Alexander LeClair , Sakib Haque , Lingfei Wu , Collin McMillan

Code Representation Learning with Pr\"ufer Sequences

An effective and efficient encoding of the source code of a computer program is critical to the success of sequence-to-sequence deep neural network models for tasks in computer program comprehension, such as automated code summarization and…

Artificial Intelligence · Computer Science 2021-11-16 Tenzin Jinpa , Yong Gao

Abstract Syntax Networks for Code Generation and Semantic Parsing

Tasks like code generation and semantic parsing require mapping unstructured (or partially structured) inputs to well-formed, executable outputs. We introduce abstract syntax networks, a modeling framework for these problems. The outputs…

Computation and Language · Computer Science 2017-04-26 Maxim Rabinovich , Mitchell Stern , Dan Klein

Enhancing Source Code Representations for Deep Learning with Static Analysis

Deep learning techniques applied to program analysis tasks such as code classification, summarization, and bug detection have seen widespread interest. Traditional approaches, however, treat programming source code as natural language text,…

Software Engineering · Computer Science 2024-02-16 Xueting Guan , Christoph Treude

Unified Abstract Syntax Tree Representation Learning for Cross-Language Program Classification

Program classification can be regarded as a high-level abstraction of code, laying a foundation for various tasks related to source code comprehension, and has a very wide range of applications in the field of software engineering, such as…

Software Engineering · Computer Science 2022-05-03 Kesu Wang , Meng Yan , He Zhang , Haibo Hu

CAST: Enhancing Code Summarization with Hierarchical Splitting and Reconstruction of Abstract Syntax Trees

Code summarization aims to generate concise natural language descriptions of source code, which can help improve program comprehension and maintenance. Recent studies show that syntactic and structural information extracted from abstract…

Software Engineering · Computer Science 2021-12-01 Ensheng Shi , Yanlin Wang , Lun Du , Hongyu Zhang , Shi Han , Dongmei Zhang , Hongbin Sun

Comparative Code Structure Analysis using Deep Learning for Performance Prediction

Performance analysis has always been an afterthought during the application development process, focusing on application correctness first. The learning curve of the existing static and dynamic analysis tools are steep, which requires…

Machine Learning · Computer Science 2021-04-23 Nathan Pinnow , Tarek Ramadan , Tanzima Z. Islam , Chase Phelps , Jayaraman J. Thiagarajan

Learning to Extend Program Graphs to Work-in-Progress Code

Source code spends most of its time in a broken or incomplete state during software development. This presents a challenge to machine learning for code, since high-performing models typically rely on graph structured representations of…

Machine Learning · Computer Science 2021-06-01 Xuechen Li , Chris J. Maddison , Daniel Tarlow

Detecting Code Vulnerabilities with Heterogeneous GNN Training

Detecting vulnerabilities in source code is a critical task for software security assurance. Graph Neural Network (GNN) machine learning can be a promising approach by modeling source code as graphs. Early approaches treated code elements…

Cryptography and Security · Computer Science 2025-02-25 Yu Luo , Weifeng Xu , Dianxiang Xu