Related papers: SparseCoder: Identifier-Aware Sparse Transformer f…

SparseCoder: Advancing Source Code Analysis with Sparse Attention and Learned Token Pruning

As software projects rapidly evolve, software artifacts become more complex and defects behind get harder to identify. The emerging Transformer-based approaches, though achieving remarkable performance, struggle with long code sequences due…

Software Engineering · Computer Science 2024-09-13 Xueqi Yang , Mariusz Jakubowski , Li Kang , Haojie Yu , Tim Menzies

A Transformer-based Approach for Source Code Summarization

Generating a readable summary that describes the functionality of a program is known as source code summarization. In this task, learning code representation by modeling the pairwise relationship between code tokens to capture their…

Software Engineering · Computer Science 2020-05-05 Wasi Uddin Ahmad , Saikat Chakraborty , Baishakhi Ray , Kai-Wei Chang

LongCoder: A Long-Range Pre-trained Language Model for Code Completion

In this paper, we introduce a new task for code completion that focuses on handling long code input and propose a sparse Transformer model, called LongCoder, to address this task. LongCoder employs a sliding window mechanism for…

Software Engineering · Computer Science 2023-06-27 Daya Guo , Canwen Xu , Nan Duan , Jian Yin , Julian McAuley

Sympiler: Transforming Sparse Matrix Codes by Decoupling Symbolic Analysis

Sympiler is a domain-specific code generator that optimizes sparse matrix computations by decoupling the symbolic analysis phase from the numerical manipulation stage in sparse codes. The computation patterns in sparse numerical methods are…

Programming Languages · Computer Science 2018-01-08 Kazem Cheshmi , Shoaib Kamil , Michelle Mills Strout , Maryam Mehri Dehnavi

AST-MHSA : Code Summarization using Multi-Head Self-Attention

Code summarization aims to generate concise natural language descriptions for source code. The prevailing approaches adopt transformer-based encoder-decoder architectures, where the Abstract Syntax Tree (AST) of the source code is utilized…

Computation and Language · Computer Science 2023-08-11 Yeshwanth Nagaraj , Ujjwal Gupta

Sparse Attention-Based Neural Networks for Code Classification

Categorizing source codes accurately and efficiently is a challenging problem in real-world programming education platform management. In recent years, model-based approaches utilizing abstract syntax trees (ASTs) have been widely applied…

Programming Languages · Computer Science 2023-11-14 Ziyang Xiang , Zaixi Zhang , Qi Liu

Low-Resources Project-Specific Code Summarization

Code summarization generates brief natural language descriptions of source code pieces, which can assist developers in understanding code and reduce documentation workload. Recent neural models on code summarization are trained and…

Software Engineering · Computer Science 2022-10-24 Rui Xie , Tianxiang Hu , Wei Ye , Shikun Zhang

Code Structure Guided Transformer for Source Code Summarization

Code summaries help developers comprehend programs and reduce their time to infer the program functionalities during software maintenance. Recent efforts resort to deep learning techniques such as sequence-to-sequence models for generating…

Computation and Language · Computer Science 2023-02-08 Shuzheng Gao , Cuiyun Gao , Yulan He , Jichuan Zeng , Lun Yiu Nie , Xin Xia , Michael R. Lyu

Sparsity and Sentence Structure in Encoder-Decoder Attention of Summarization Systems

Transformer models have achieved state-of-the-art results in a wide range of NLP tasks including summarization. Training and inference using large transformer models can be computationally expensive. Previous work has focused on one…

Computation and Language · Computer Science 2021-09-10 Potsawee Manakul , Mark J. F. Gales

Revisiting File Context for Source Code Summarization

Source code summarization is the task of writing natural language descriptions of source code. A typical use case is generating short summaries of subroutines for use in API documentation. The heart of almost all current research into code…

Software Engineering · Computer Science 2023-09-06 Aakash Bansal , Chia-Yi Su , Collin McMillan

Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding

Transformer has become ubiquitous in the deep learning field. One of the key ingredients that destined its success is the self-attention mechanism, which allows fully-connected contextual encoding over input tokens. However, despite its…

Computation and Language · Computer Science 2021-06-08 Shuohang Wang , Luowei Zhou , Zhe Gan , Yen-Chun Chen , Yuwei Fang , Siqi Sun , Yu Cheng , Jingjing Liu

Project-Level Encoding for Neural Source Code Summarization of Subroutines

Source code summarization of a subroutine is the task of writing a short, natural language description of that subroutine. The description usually serves in documentation aimed at programmers, where even brief phrase (e.g. "compresses data…

Software Engineering · Computer Science 2021-03-23 Aakash Bansal , Sakib Haque , Collin McMillan

Statement-based Memory for Neural Source Code Summarization

Source code summarization is the task of writing natural language descriptions of source code behavior. Code summarization underpins software documentation for programmers. Short descriptions of code help programmers understand the program…

Artificial Intelligence · Computer Science 2023-07-24 Aakash Bansal , Siyuan Jiang , Sakib Haque , Collin McMillan

In-Context Compositional Learning via Sparse Coding Transformer

Transformer architectures have achieved remarkable success across language, vision, and multimodal tasks, and there is growing demand for them to address in-context compositional learning tasks. In these tasks, models solve the target…

Machine Learning · Computer Science 2025-11-26 Wei Chen , Jingxi Yu , Zichen Miao , Qiang Qiu

ShortCoder: Knowledge-Augmented Syntax Optimization for Token-Efficient Code Generation

Code generation tasks aim to automate the conversion of user requirements into executable code, significantly reducing manual development efforts and enhancing software productivity. The emergence of large language models (LLMs) has…

Software Engineering · Computer Science 2026-01-15 Sicong Liu , Yanxian Huang , Mingwei Liu , Jiachi Chen , Ensheng Shi , Yuchi Ma , Hongyu Zhang , Yin Zhang , Yanlin Wang

Understanding Long Programming Languages with Structure-Aware Sparse Attention

Programming-based Pre-trained Language Models (PPLMs) such as CodeBERT have achieved great success in many downstream code-related tasks. Since the memory and computational complexity of self-attention in the Transformer grow quadratically…

Computation and Language · Computer Science 2022-05-30 Tingting Liu , Chengyu Wang , Cen Chen , Ming Gao , Aoying Zhou

Predicting Attention Sparsity in Transformers

Transformers' quadratic complexity with respect to the input sequence length has motivated a body of work on efficient sparse approximations to softmax. An alternative path, used by entmax transformers, consists of having built-in exact…

Computation and Language · Computer Science 2022-04-22 Marcos Treviso , António Góis , Patrick Fernandes , Erick Fonseca , André F. T. Martins

StructCoder: Structure-Aware Transformer for Code Generation

There has been a recent surge of interest in automating software engineering tasks using deep learning. This paper addresses the problem of code generation, where the goal is to generate target code given source code in a different language…

Machine Learning · Computer Science 2024-02-01 Sindhu Tipirneni , Ming Zhu , Chandan K. Reddy

TranS^3: A Transformer-based Framework for Unifying Code Summarization and Code Search

Code summarization and code search have been widely adopted in sofwaredevelopmentandmaintenance. However, fewstudieshave explored the efcacy of unifying them. In this paper, we propose TranS^3 , a transformer-based framework to integrate…

Software Engineering · Computer Science 2020-03-10 Wenhua Wang , Yuqun Zhang , Zhengran Zeng , Guandong Xu

CodeSum: Translate Program Language to Natural Language

During software maintenance, programmers spend a lot of time on code comprehension. Reading comments is an effective way for programmers to reduce the reading and navigating time when comprehending source code. Therefore, as a critical task…

Software Engineering · Computer Science 2018-02-01 Xing Hu , Yuhan Wei , Ge Li , Zhi Jin