English
Related papers

Related papers: Unified Abstract Syntax Tree Representation Learni…

200 papers

Programming language understanding and representation (a.k.a code representation learning) has always been a hot and challenging task in software engineering. It aims to apply deep learning techniques to produce numerical representations of…

Software Engineering · Computer Science 2023-12-04 Weisong Sun , Chunrong Fang , Yun Miao , Yudu You , Mengzhe Yuan , Yuchen Chen , Quanjun Zhang , An Guo , Xiang Chen , Yang Liu , Zhenyu Chen

The lexical and syntactic disparities among different programming languages (e.g., Java and Python) pose significant challenges for multi-language software engineering tasks such as cross-language code clone detection and code retrieval,…

Software Engineering · Computer Science 2026-05-11 Junhao Chen , Jingxuan Zhang , Jian He , Yixuan Tang , Weiqin Zou

Code summarization aims to generate concise natural language descriptions of source code, which can help improve program comprehension and maintenance. Recent studies show that syntactic and structural information extracted from abstract…

Software Engineering · Computer Science 2021-12-01 Ensheng Shi , Yanlin Wang , Lun Du , Hongyu Zhang , Shi Han , Dongmei Zhang , Hongbin Sun

An effective and efficient encoding of the source code of a computer program is critical to the success of sequence-to-sequence deep neural network models for tasks in computer program comprehension, such as automated code summarization and…

Artificial Intelligence · Computer Science 2021-11-16 Tenzin Jinpa , Yong Gao

Source code (Context) and its parsed abstract syntax tree (AST; Structure) are two complementary representations of the same computer program. Traditionally, designers of machine learning models have relied predominantly either on Structure…

Machine Learning · Computer Science 2021-03-23 Daniel Zügner , Tobias Kirschstein , Michele Catasta , Jure Leskovec , Stephan Günnemann

Program source code contains complex structure information, which can be represented in structured data forms like trees or graphs. To acquire the structural information in source code, most existing researches use abstract syntax trees…

Software Engineering · Computer Science 2022-04-13 Kechi Zhang , Wenhan Wang , Huangzhao Zhang , Ge Li , Zhi Jin

Automatic code summarization frees software developers from the heavy burden of manual commenting and benefits software development and maintenance. Abstract Syntax Tree (AST), which depicts the source code's syntactic structure, has been…

Software Engineering · Computer Science 2021-03-19 Chen Lin , Zhichao Ouyang , Junqing Zhuang , Jianqiang Chen , Hui Li , Rongxin Wu

The objective of pre-trained language models is to learn contextual representations of textual data. Pre-trained language models have become mainstream in natural language processing and code modeling. Using probes, a technique to study the…

Computation and Language · Computer Science 2022-09-13 José Antonio Hernández López , Martin Weyssow , Jesús Sánchez Cuadrado , Houari Sahraoui

Pre-trained models for programming languages have recently demonstrated great success on code intelligence. To support both code-related understanding and generation tasks, recent works attempt to pre-train unified encoder-decoder models.…

Computation and Language · Computer Science 2022-03-09 Daya Guo , Shuai Lu , Nan Duan , Yanlin Wang , Ming Zhou , Jian Yin

Code clones are semantically similar code fragments pairs that are syntactically similar or different. Detection of code clones can help to reduce the cost of software maintenance and prevent bugs. Numerous approaches of detecting code…

Software Engineering · Computer Science 2020-02-21 Wenhan Wang , Ge Li , Bo Ma , Xin Xia , Zhi Jin

Program representation learning is a fundamental task in software engineering applications. With the availability of "big code" and the development of deep learning techniques, various program representation learning models have been…

Software Engineering · Computer Science 2021-09-17 Siqi Han , DongXia Wang , Wanting Li , Xuesong Lu

Performance analysis has always been an afterthought during the application development process, focusing on application correctness first. The learning curve of the existing static and dynamic analysis tools are steep, which requires…

Machine Learning · Computer Science 2021-04-23 Nathan Pinnow , Tarek Ramadan , Tanzima Z. Islam , Chase Phelps , Jayaraman J. Thiagarajan

This paper presents Tree Notation, a new simple, universal syntax. Language designers can invent new programming languages, called Tree Languages, on top of Tree Notation. Tree Languages have a number of advantages over traditional…

Programming Languages · Computer Science 2017-10-25 Breck Yunits

The application of deep learning techniques in software engineering becomes increasingly popular. One key problem is developing high-quality and easy-to-use source code representations for code-related tasks. The research community has…

Software Engineering · Computer Science 2023-11-07 Zhiwei Xu , Min Zhou , Xibin Zhao , Yang Chen , Xi Cheng , Hongyu Zhang

During software maintenance, programmers spend a lot of time on code comprehension. Reading comments is an effective way for programmers to reduce the reading and navigating time when comprehending source code. Therefore, as a critical task…

Software Engineering · Computer Science 2018-02-01 Xing Hu , Yuhan Wei , Ge Li , Zhi Jin

Tasks like code generation and semantic parsing require mapping unstructured (or partially structured) inputs to well-formed, executable outputs. We introduce abstract syntax networks, a modeling framework for these problems. The outputs…

Computation and Language · Computer Science 2017-04-26 Maxim Rabinovich , Mitchell Stern , Dan Klein

Code summarization aims to generate brief natural language descriptions for source code. As source code is highly structured and follows strict programming language grammars, its Abstract Syntax Tree (AST) is often leveraged to inform the…

Computation and Language · Computer Science 2021-12-03 Ze Tang , Chuanyi Li , Jidong Ge , Xiaoyu Shen , Zheling Zhu , Bin Luo

Learning representation for source code is a foundation of many program analysis tasks. In recent years, neural networks have already shown success in this area, but most existing models did not make full use of the unique structural…

Software Engineering · Computer Science 2021-04-02 Wenhan Wang , Ge Li , Sijie Shen , Xin Xia , Zhi Jin

We introduce the MultiLang Code Parser Dataset (MLCPD), a large-scale, language-agnostic dataset unifying syntactic and structural representations of code across ten major programming languages. MLCPD contains over seven million parsed…

Software Engineering · Computer Science 2025-10-21 Jugal Gajjar , Kamalasankari Subramaniakuppusamy

Program semantics learning is the core and fundamental for various code intelligent tasks e.g., vulnerability detection, clone detection. A considerable amount of existing works propose diverse approaches to learn the program semantics for…

Software Engineering · Computer Science 2022-03-23 Jing Kai Siow , Shangqing Liu , Xiaofei Xie , Guozhu Meng , Yang Liu
‹ Prev 1 2 3 10 Next ›