Related papers: Demystifying What Code Summarization Models Learne…

Towards Understanding What Code Language Models Learned

Pre-trained language models are effective in a variety of natural language tasks, but it has been argued their capabilities fall short of fully learning meaning or understanding language. To understand the extent to which language models…

Software Engineering · Computer Science 2024-02-29 Toufique Ahmed , Dian Yu , Chengxuan Huang , Cathy Wang , Prem Devanbu , Kenji Sagae

A Neural Model for Generating Natural Language Summaries of Program Subroutines

Source code summarization -- creating natural language descriptions of source code behavior -- is a rapidly-growing research topic with applications to automatic documentation generation, program comprehension, and software maintenance.…

Software Engineering · Computer Science 2019-02-07 Alexander LeClair , Siyuan Jiang , Collin McMillan

Statement-based Memory for Neural Source Code Summarization

Source code summarization is the task of writing natural language descriptions of source code behavior. Code summarization underpins software documentation for programmers. Short descriptions of code help programmers understand the program…

Artificial Intelligence · Computer Science 2023-07-24 Aakash Bansal , Siyuan Jiang , Sakib Haque , Collin McMillan

Sample-efficient Linguistic Generalizations through Program Synthesis: Experiments with Phonology Problems

Neural models excel at extracting statistical patterns from large amounts of data, but struggle to learn patterns or reason about language from only a few examples. In this paper, we ask: Can we learn explicit rules that generalize well…

Computation and Language · Computer Science 2021-06-15 Saujas Vaduguru , Aalok Sathe , Monojit Choudhury , Dipti Misra Sharma

Understanding Code Semantics: An Evaluation of Transformer Models in Summarization

This paper delves into the intricacies of code summarization using advanced transformer-based language models. Through empirical studies, we evaluate the efficacy of code summarization by altering function and variable names to explore…

Machine Learning · Computer Science 2023-10-30 Debanjan Mondal , Abhilasha Lodha , Ankita Sahoo , Beena Kumari

Towards Modeling Human Attention from Eye Movements for Neural Source Code Summarization

Neural source code summarization is the task of generating natural language descriptions of source code behavior using neural networks. A fundamental component of most neural models is an attention mechanism. The attention mechanism learns…

Software Engineering · Computer Science 2023-05-18 Aakash Bansal , Bonita Sharif , Collin McMillan

How to Write Summaries with Patterns? Learning towards Abstractive Summarization through Prototype Editing

Under special circumstances, summaries should conform to a particular style with patterns, such as court judgments and abstracts in academic papers. To this end, the prototype document-summary pairs can be utilized to generate better…

Computation and Language · Computer Science 2019-09-20 Shen Gao , Xiuying Chen , Piji Li , Zhangming Chan , Dongyan Zhao , Rui Yan

What Do They Capture? -- A Structural Analysis of Pre-Trained Language Models for Source Code

Recently, many pre-trained language models for source code have been proposed to model the context of code and serve as a basis for downstream code intelligence tasks such as code completion, code search, and code summarization. These…

Software Engineering · Computer Science 2022-02-15 Yao Wan , Wei Zhao , Hongyu Zhang , Yulei Sui , Guandong Xu , Hai Jin

Physics of Language Models: Part 1, Learning Hierarchical Language Structures

Transformer-based language models are effective but complex, and understanding their inner workings and reasoning mechanisms is a significant challenge. Previous research has primarily explored how these models handle simple tasks like name…

Computation and Language · Computer Science 2025-05-20 Zeyuan Allen-Zhu , Yuanzhi Li

Do Machines and Humans Focus on Similar Code? Exploring Explainability of Large Language Models in Code Summarization

Recent language models have demonstrated proficiency in summarizing source code. However, as in many other domains of machine learning, language models of code lack sufficient explainability. Informally, we lack a formulaic or intuitive…

Software Engineering · Computer Science 2024-02-23 Jiliang Li , Yifan Zhang , Zachary Karas , Collin McMillan , Kevin Leach , Yu Huang

DPS: Design Pattern Summarisation Using Code Features

Automatic summarisation has been used efficiently in recent years to condense texts, conversations, audio, code, and various other artefacts. A range of methods, from simple template-based summaries to complex machine learning techniques --…

Software Engineering · Computer Science 2025-12-08 Najam Nazar , Sameer Sikka , Christoph Treude

Emergent Representations of Program Semantics in Language Models Trained on Programs

We present evidence that language models (LMs) of code can learn to represent the formal semantics of programs, despite being trained only to perform next-token prediction. Specifically, we train a Transformer model on a synthetic corpus of…

Machine Learning · Computer Science 2024-08-06 Charles Jin , Martin Rinard

On the Generation, Structure, and Semantics of Grammar Patterns in Source Code Identifiers

Identifiers make up a majority of the text in code. They are one of the most basic mediums through which developers describe the code they create and understand the code that others create. Therefore, understanding the patterns latent in…

Software Engineering · Computer Science 2020-07-17 Christian D. Newman , Reem S. AlSuhaibani , Michael J. Decker , Anthony Peruma , Dishant Kaushik , Mohamed Wiem Mkaouer , Emily Hill

A Transformer-based Approach for Source Code Summarization

Generating a readable summary that describes the functionality of a program is known as source code summarization. In this task, learning code representation by modeling the pairwise relationship between code tokens to capture their…

Software Engineering · Computer Science 2020-05-05 Wasi Uddin Ahmad , Saikat Chakraborty , Baishakhi Ray , Kai-Wei Chang

Code Summarization with Structure-induced Transformer

Code summarization (CS) is becoming a promising area in recent language understanding, which aims to generate sensible human language automatically for programming language in the format of source code, serving in the most convenience of…

Computation and Language · Computer Science 2021-06-02 Hongqiu Wu , Hai Zhao , Min Zhang

A Prompt Learning Framework for Source Code Summarization

(Source) code summarization is the task of automatically generating natural language summaries (also called comments) for given code snippets. Recently, with the successful application of large language models (LLMs) in numerous fields,…

Software Engineering · Computer Science 2024-12-10 Tingting Xu , Yun Miao , Chunrong Fang , Hanwei Qian , Xia Feng , Zhenpeng Chen , Chong Wang , Jian Zhang , Weisong Sun , Zhenyu Chen , Yang Liu

INSPECT: Intrinsic and Systematic Probing Evaluation for Code Transformers

Pre-trained models of source code have recently been successfully applied to a wide variety of Software Engineering tasks; they have also seen some practical adoption in practice, e.g. for code completion. Yet, we still know very little…

Software Engineering · Computer Science 2023-12-11 Anjan Karmakar , Romain Robbes

A Convolutional Neural Network for Language-Agnostic Source Code Summarization

Descriptive comments play a crucial role in the software engineering process. They decrease development time, enable better bug detection, and facilitate the reuse of previously written code. However, comments are commonly the last of a…

Computation and Language · Computer Science 2019-04-02 Jessica Moore , Ben Gelman , David Slater

Summarization Techniques for Pattern Collections in Data Mining

Discovering patterns from data is an important task in data mining. There exist techniques to find large collections of many kinds of patterns from data very efficiently. A collection of patterns can be regarded as a summary of the data. A…

Databases · Computer Science 2007-05-23 Taneli Mielikäinen

Pattern-Based Classification: A Unifying Perspective

The use of patterns in predictive models is a topic that has received a lot of attention in recent years. Pattern mining can help to obtain models for structured domains, such as graphs and sequences, and has been proposed as a means to…

Artificial Intelligence · Computer Science 2011-11-29 Björn Bringmann , Siegfried Nijssen , Albrecht Zimmermann