Related papers: Code Representation Learning with Pr\"ufer Sequenc…
Programming language understanding and representation (a.k.a code representation learning) has always been a hot and challenging task in software engineering. It aims to apply deep learning techniques to produce numerical representations of…
Learning vector representations for programs is a critical step in applying deep learning techniques for program understanding tasks. Various neural network models are proposed to learn from tree-structured program representations, e.g.,…
Code summarization aims to generate concise natural language descriptions of source code, which can help improve program comprehension and maintenance. Recent studies show that syntactic and structural information extracted from abstract…
Program classification can be regarded as a high-level abstraction of code, laying a foundation for various tasks related to source code comprehension, and has a very wide range of applications in the field of software engineering, such as…
Current language models tailored for code tasks often adopt the pre-training-then-fine-tuning paradigm from natural language processing, modeling source code as plain text. This approach, however, overlooks the unambiguous structures…
Code summarization aims to generate brief natural language descriptions for source code. As source code is highly structured and follows strict programming language grammars, its Abstract Syntax Tree (AST) is often leveraged to inform the…
The application of deep learning techniques in software engineering becomes increasingly popular. One key problem is developing high-quality and easy-to-use source code representations for code-related tasks. The research community has…
Program representation learning is a fundamental task in software engineering applications. With the availability of "big code" and the development of deep learning techniques, various program representation learning models have been…
Program source code contains complex structure information, which can be represented in structured data forms like trees or graphs. To acquire the structural information in source code, most existing researches use abstract syntax trees…
Performance analysis has always been an afterthought during the application development process, focusing on application correctness first. The learning curve of the existing static and dynamic analysis tools are steep, which requires…
Predicting program properties such as names or expression types has a wide range of applications. It can ease the task of programming and increase programmer productivity. A major challenge when learning from programs is $\textit{how to…
The objective of pre-trained language models is to learn contextual representations of textual data. Pre-trained language models have become mainstream in natural language processing and code modeling. Using probes, a technique to study the…
Learning representation for source code is a foundation of many program analysis tasks. In recent years, neural networks have already shown success in this area, but most existing models did not make full use of the unique structural…
Transformer-based models have demonstrated significant success in various source code representation tasks. Nonetheless, traditional positional embeddings employed by these models inadequately capture the hierarchical structure intrinsic to…
Source code (Context) and its parsed abstract syntax tree (AST; Structure) are two complementary representations of the same computer program. Traditionally, designers of machine learning models have relied predominantly either on Structure…
Pre-trained models for programming language have achieved dramatic empirical improvements on a variety of code-related tasks such as code search, code completion, code summarization, etc. However, existing pre-trained models regard a code…
Deep learning techniques applied to program analysis tasks such as code classification, summarization, and bug detection have seen widespread interest. Traditional approaches, however, treat programming source code as natural language text,…
Efficiently representing source code is crucial for various software engineering tasks such as code classification and clone detection. Existing approaches primarily use Abstract Syntax Tree (AST), and only a few focus on semantic graphs…
Code representation learning, which aims to encode the semantics of source code into distributed vectors, plays an important role in recent deep-learning-based models for code intelligence. Recently, many pre-trained language models for…
Source code summarization aims at generating concise and clear natural language descriptions for programming languages. Well-written code summaries are beneficial for programmers to participate in the software development and maintenance…