Related papers: Enhancing Neural Code Representation with Addition…

Encoding Version History Context for Better Code Representation

With the exponential growth of AI tools that generate source code, understanding software has become crucial. When developers comprehend a program, they may refer to additional contexts to look for information, e.g. program documentation or…

Software Engineering · Computer Science 2024-02-07 Huy Nguyen , Christoph Treude , Patanamon Thongtanunam

Adding Context to Source Code Representations for Deep Learning

Deep learning models have been successfully applied to a variety of software engineering tasks, such as code classification, summarisation, and bug and vulnerability detection. In order to apply deep learning to these tasks, source code…

Software Engineering · Computer Science 2022-08-02 Fuwei Tian , Christoph Treude

CL4SE: Benchmarking Context Learning on Software Engineering

Context engineering has emerged as a pivotal paradigm for unlocking the potential of Large Language Models (LLMs) in Software Engineering (SE) tasks, enabling performance gains at test time without model fine-tuning. Despite its success,…

Software Engineering · Computer Science 2026-04-07 Haichuan Hu , Quanjun Zhang , Ye Shang , Guoqing Xie , Chunrong Fang , Zhenyu Chen , Liang Xiao

Ensemble Models for Neural Source Code Summarization of Subroutines

A source code summary of a subroutine is a brief description of that subroutine. Summaries underpin a majority of documentation consumed by programmers, such as the method summaries in JavaDocs. Source code summarization is the task of…

Software Engineering · Computer Science 2021-07-27 Alexander LeClair , Aakash Bansal , Collin McMillan

CodeSAM: Source Code Representation Learning by Infusing Self-Attention with Multi-Code-View Graphs

Machine Learning (ML) for software engineering (SE) has gained prominence due to its ability to significantly enhance the performance of various SE applications. This progress is largely attributed to the development of generalizable source…

Software Engineering · Computer Science 2024-11-25 Alex Mathai , Kranthi Sedamaki , Debeshee Das , Noble Saji Mathews , Srikanth Tamilselvam , Sridhar Chimalakonda , Atul Kumar

On the Effect of Semantically Enriched Context Models on Software Modularization

Many of the existing approaches for program comprehension rely on the linguistic information found in source code, such as identifier names and comments. Semantic clustering is one such technique for modularization of the system that relies…

Software Engineering · Computer Science 2017-08-08 Amir Saeidi , Jurriaan Hage , Ravi Khadka , Slinger Jansen

Meta Learning for Code Summarization

Source code summarization is the task of generating a high-level natural language description for a segment of programming language code. Current neural models for the task differ in their architecture and the aspects of code they consider.…

Machine Learning · Computer Science 2022-01-21 Moiz Rauf , Sebastian Padó , Michael Pradel

Learning code summarization from a small and local dataset

Foundation models (e.g., CodeBERT, GraphCodeBERT, CodeT5) work well for many software engineering tasks. These models are pre-trained (using self-supervision) with billions of code tokens, and then fine-tuned with hundreds of thousands of…

Software Engineering · Computer Science 2022-06-03 Toufique Ahmed , Premkumar Devanbu

Code Summarization Beyond Function Level

Code summarization is a critical task in natural language processing and software engineering, which aims to generate concise descriptions of source code. Recent advancements have improved the quality of these summaries, enhancing code…

Computation and Language · Computer Science 2025-02-25 Vladimir Makharev , Vladimir Ivanov

ESALE: Enhancing Code-Summary Alignment Learning for Source Code Summarization

(Source) code summarization aims to automatically generate succinct natural language summaries for given code snippets. Such summaries play a significant role in promoting developers to understand and maintain code. Inspired by neural…

Software Engineering · Computer Science 2024-07-03 Chunrong Fang , Weisong Sun , Yuchen Chen , Xiao Chen , Zhao Wei , Quanjun Zhang , Yudu You , Bin Luo , Yang Liu , Zhenyu Chen

Contrastive Code Representation Learning

Recent work learns contextual representations of source code by reconstructing tokens from their context. For downstream semantic understanding tasks like summarizing code in English, these representations should ideally capture program…

Machine Learning · Computer Science 2022-01-10 Paras Jain , Ajay Jain , Tianjun Zhang , Pieter Abbeel , Joseph E. Gonzalez , Ion Stoica

Optimizing Datasets for Code Summarization: Is Code-Comment Coherence Enough?

Automated code summarization is a long-standing goal for code comprehension. This task automatically generates documentation using a given method. Deep Learning (DL)-based approaches have been proven beneficial for various software…

Software Engineering · Computer Science 2025-02-12 Antonio Vitale , Antonio Mastropaolo , Rocco Oliveto , Massimiliano Di Penta , Simone Scalabrino

Sequence Shortening for Context-Aware Machine Translation

Context-aware Machine Translation aims to improve translations of sentences by incorporating surrounding sentences as context. Towards this task, two main architectures have been applied, namely single-encoder (based on concatenation) and…

Computation and Language · Computer Science 2024-02-05 Paweł Mąka , Yusuf Can Semerci , Jan Scholtes , Gerasimos Spanakis

Deep Learning-based Code Completion: On the Impact on Performance of Contextual Information

Code completion aims at speeding up code writing by recommending to developers the next tokens they are likely to type. Deep Learning (DL) models pushed the boundaries of code completion by redefining what these coding assistants can do: We…

Software Engineering · Computer Science 2025-01-10 Matteo Ciniselli , Luca Pascarella , Gabriele Bavota

CoreGen: Contextualized Code Representation Learning for Commit Message Generation

Automatic generation of high-quality commit messages for code commits can substantially facilitate software developers' works and coordination. However, the semantic gap between source code and natural language poses a major challenge for…

Computation and Language · Computer Science 2021-06-22 Lun Yiu Nie , Cuiyun Gao , Zhicong Zhong , Wai Lam , Yang Liu , Zenglin Xu

Enhancing Source Code Classification Effectiveness via Prompt Learning Incorporating Knowledge Features

Researchers have investigated the potential of leveraging pre-trained language models, such as CodeBERT, to enhance source code-related tasks. Previous methodologies have relied on CodeBERT's '[CLS]' token as the embedding representation of…

Computation and Language · Computer Science 2024-09-04 Yong Ma , Senlin Luo , Yu-Ming Shang , Yifei Zhang , Zhengjun Li

Enriching Source Code with Contextual Data for Code Completion Models: An Empirical Study

Transformer-based pre-trained models have recently achieved great results in solving many software engineering tasks including automatic code completion which is a staple in a developer's toolkit. While many have striven to improve the…

Computation and Language · Computer Science 2023-04-25 Tim van Dam , Maliheh Izadi , Arie van Deursen

On the Evaluation of Neural Code Summarization

Source code summaries are important for program comprehension and maintenance. However, there are plenty of programs with missing, outdated, or mismatched summaries. Recently, deep learning techniques have been exploited to automatically…

Software Engineering · Computer Science 2022-02-14 Ensheng Shi , Yanlin Wang , Lun Du , Junjie Chen , Shi Han , Hongyu Zhang , Dongmei Zhang , Hongbin Sun

Improved Code Summarization via a Graph Neural Network

Automatic source code summarization is the task of generating natural language descriptions for source code. Automatic code summarization is a rapidly expanding research area, especially as the community has taken greater advantage of…

Software Engineering · Computer Science 2020-04-08 Alexander LeClair , Sakib Haque , Lingfei Wu , Collin McMillan

Should Code Models Learn Pedagogically? A Preliminary Evaluation of Curriculum Learning for Real-World Software Engineering Tasks

Learning-based techniques, especially advanced pre-trained models for code have demonstrated capabilities in code understanding and generation, solving diverse software engineering (SE) tasks. Despite the promising results, current training…

Software Engineering · Computer Science 2025-02-07 Kyi Shin Khant , Hong Yi Lin , Patanamon Thongtanunam