Related papers: Introducing Enriched Concrete Syntax Trees

Structured Code Representations Enable Data-Efficient Adaptation of Code Language Models

Current language models tailored for code tasks often adopt the pre-training-then-fine-tuning paradigm from natural language processing, modeling source code as plain text. This approach, however, overlooks the unambiguous structures…

Computation and Language · Computer Science 2024-01-22 Mayank Agarwal , Yikang Shen , Bailin Wang , Yoon Kim , Jie Chen

xASTNN: Improved Code Representations for Industrial Practice

The application of deep learning techniques in software engineering becomes increasingly popular. One key problem is developing high-quality and easy-to-use source code representations for code-related tasks. The research community has…

Software Engineering · Computer Science 2023-11-07 Zhiwei Xu , Min Zhou , Xibin Zhao , Yang Chen , Xi Cheng , Hongyu Zhang

Structural Embedding of Syntactic Trees for Machine Comprehension

Deep neural networks for machine comprehension typically utilizes only word or character embeddings without explicitly taking advantage of structured linguistic information such as constituency trees and dependency trees. In this paper, we…

Computation and Language · Computer Science 2017-09-04 Rui Liu , Junjie Hu , Wei Wei , Zi Yang , Eric Nyberg

CAST: Enhancing Code Summarization with Hierarchical Splitting and Reconstruction of Abstract Syntax Trees

Code summarization aims to generate concise natural language descriptions of source code, which can help improve program comprehension and maintenance. Recent studies show that syntactic and structural information extracted from abstract…

Software Engineering · Computer Science 2021-12-01 Ensheng Shi , Yanlin Wang , Lun Du , Hongyu Zhang , Shi Han , Dongmei Zhang , Hongbin Sun

Tree Notation: an antifragile program notation

This paper presents Tree Notation, a new simple, universal syntax. Language designers can invent new programming languages, called Tree Languages, on top of Tree Notation. Tree Languages have a number of advantages over traditional…

Programming Languages · Computer Science 2017-10-25 Breck Yunits

eRST: A Signaled Graph Theory of Discourse Relations and Organization

In this article we present Enhanced Rhetorical Structure Theory (eRST), a new theoretical framework for computational discourse analysis, based on an expansion of Rhetorical Structure Theory (RST). The framework encompasses discourse…

Computation and Language · Computer Science 2024-08-29 Amir Zeldes , Tatsuya Aoyama , Yang Janet Liu , Siyao Peng , Debopam Das , Luke Gessler

Concrete Syntax with Black Box Parsers

Context: Meta programming consists for a large part of matching, analyzing, and transforming syntax trees. Many meta programming systems process abstract syntax trees, but this requires intimate knowledge of the structure of the data type…

Programming Languages · Computer Science 2019-02-05 Rodin Aarssen , Jurgen Vinju , Tijs van der Storm

Abstract Syntax Tree for Programming Language Understanding and Representation: How Far Are We?

Programming language understanding and representation (a.k.a code representation learning) has always been a hot and challenging task in software engineering. It aims to apply deep learning techniques to produce numerical representations of…

Software Engineering · Computer Science 2023-12-04 Weisong Sun , Chunrong Fang , Yun Miao , Yudu You , Mengzhe Yuan , Yuchen Chen , Quanjun Zhang , An Guo , Xiang Chen , Yang Liu , Zhenyu Chen

AST-Transformer: Encoding Abstract Syntax Trees Efficiently for Code Summarization

Code summarization aims to generate brief natural language descriptions for source code. As source code is highly structured and follows strict programming language grammars, its Abstract Syntax Tree (AST) is often leveraged to inform the…

Computation and Language · Computer Science 2021-12-03 Ze Tang , Chuanyi Li , Jidong Ge , Xiaoyu Shen , Zheling Zhu , Bin Luo

Problems in Systematic Application of Software Metrics and Possible Solution

Systematic application of software metric techniques can lead to significant improvements of the quality of a final software product. However, there is still the evident lack of wider utilization of software metrics techniques and tools due…

Software Engineering · Computer Science 2013-11-18 Gordana Rakic , Zoran Budimac

Automatic Source Code Summarization with Extended Tree-LSTM

Neural machine translation models are used to automatically generate a document from given source code since this can be regarded as a machine translation task. Source code summarization is one of the components for automatic document…

Machine Learning · Computer Science 2019-06-24 Yusuke Shido , Yasuaki Kobayashi , Akihiro Yamamoto , Atsushi Miyamoto , Tadayuki Matsumura

Code Summarization with Structure-induced Transformer

Code summarization (CS) is becoming a promising area in recent language understanding, which aims to generate sensible human language automatically for programming language in the format of source code, serving in the most convenience of…

Computation and Language · Computer Science 2021-06-02 Hongqiu Wu , Hai Zhao , Min Zhang

Revisiting Code Similarity Evaluation with Abstract Syntax Tree Edit Distance

This paper revisits recent code similarity evaluation metrics, particularly focusing on the application of Abstract Syntax Tree (AST) editing distance in diverse programming languages. In particular, we explore the usefulness of these…

Computation and Language · Computer Science 2025-06-06 Yewei Song , Cedric Lothritz , Daniel Tang , Tegawendé F. Bissyandé , Jacques Klein

Comparative Code Structure Analysis using Deep Learning for Performance Prediction

Performance analysis has always been an afterthought during the application development process, focusing on application correctness first. The learning curve of the existing static and dynamic analysis tools are steep, which requires…

Machine Learning · Computer Science 2021-04-23 Nathan Pinnow , Tarek Ramadan , Tanzima Z. Islam , Chase Phelps , Jayaraman J. Thiagarajan

Interpretable Structure-aware Document Encoders with Hierarchical Attention

We propose a method to create document representations that reflect their internal structure. We modify Tree-LSTMs to hierarchically merge basic elements such as words and sentences into blocks of increasing complexity. Our Structure…

Computation and Language · Computer Science 2019-10-08 Khalil Mrini , Claudiu Musat , Michael Baeriswyl , Martin Jaggi

Unified Abstract Syntax Tree Representation Learning for Cross-Language Program Classification

Program classification can be regarded as a high-level abstraction of code, laying a foundation for various tasks related to source code comprehension, and has a very wide range of applications in the field of software engineering, such as…

Software Engineering · Computer Science 2022-05-03 Kesu Wang , Meng Yan , He Zhang , Haibo Hu

Improving Code Summarization with Block-wise Abstract Syntax Tree Splitting

Automatic code summarization frees software developers from the heavy burden of manual commenting and benefits software development and maintenance. Abstract Syntax Tree (AST), which depicts the source code's syntactic structure, has been…

Software Engineering · Computer Science 2021-03-19 Chen Lin , Zhichao Ouyang , Junqing Zhuang , Jianqiang Chen , Hui Li , Rongxin Wu

Bringing Structure to Naturalness: On the Naturalness of ASTs

Source code comes in different shapes and forms. Previous research has already shown code to be more predictable than natural language as well as highlighted its statistical predictability at the token level: source code can be natural.…

Software Engineering · Computer Science 2025-04-14 Profir-Petru Pârţachi , Mahito Sugiyama

Autoencoders as Tools for Program Synthesis

Recently there have been many advances in research on language modeling of source code. Applications range from code suggestion and completion to code summarization. However, complete program synthesis of industry-grade programming…

Artificial Intelligence · Computer Science 2021-09-07 Sander de Bruin , Vadim Liventsev , Milan Petković

On Tree-Based Neural Sentence Modeling

Neural networks with tree-based sentence encoders have shown better results on many downstream tasks. Most of existing tree-based encoders adopt syntactic parsing trees as the explicit structure prior. To study the effectiveness of…

Computation and Language · Computer Science 2018-08-30 Haoyue Shi , Hao Zhou , Jiaze Chen , Lei Li