Related papers: CodeSAM: Source Code Representation Learning by In…

CodeSum: Translate Program Language to Natural Language

During software maintenance, programmers spend a lot of time on code comprehension. Reading comments is an effective way for programmers to reduce the reading and navigating time when comprehending source code. Therefore, as a critical task…

Software Engineering · Computer Science 2018-02-01 Xing Hu , Yuhan Wei , Ge Li , Zhi Jin

An Effective Approach to Embedding Source Code by Combining Large Language and Sentence Embedding Models

The advent of large language models (LLMs) has significantly advanced artificial intelligence (AI) in software engineering (SE), with source code embeddings playing a crucial role in tasks such as source code clone detection and source code…

Software Engineering · Computer Science 2025-06-04 Zixiang Xian , Chenhui Cui , Rubing Huang , Chunrong Fang , Zhenyu Chen

COMEX: A Tool for Generating Customized Source Code Representations

Learning effective representations of source code is critical for any Machine Learning for Software Engineering (ML4SE) system. Inspired by natural language processing, large language models (LLMs) like Codex and CodeGen treat code as…

Software Engineering · Computer Science 2023-07-11 Debeshee Das , Noble Saji Mathews , Alex Mathai , Srikanth Tamilselvam , Kranthi Sedamaki , Sridhar Chimalakonda , Atul Kumar

Enhancing Source Code Classification Effectiveness via Prompt Learning Incorporating Knowledge Features

Researchers have investigated the potential of leveraging pre-trained language models, such as CodeBERT, to enhance source code-related tasks. Previous methodologies have relied on CodeBERT's '[CLS]' token as the embedding representation of…

Computation and Language · Computer Science 2024-09-04 Yong Ma , Senlin Luo , Yu-Ming Shang , Yifei Zhang , Zhengjun Li

CoCoSum: Contextual Code Summarization with Multi-Relational Graph Neural Network

Source code summaries are short natural language descriptions of code snippets that help developers better understand and maintain source code. There has been a surge of work on automatic code summarization to reduce the burden of writing…

Software Engineering · Computer Science 2021-07-06 Yanlin Wang , Ensheng Shi , Lun Du , Xiaodi Yang , Yuxuan Hu , Shi Han , Hongyu Zhang , Dongmei Zhang

ESALE: Enhancing Code-Summary Alignment Learning for Source Code Summarization

(Source) code summarization aims to automatically generate succinct natural language summaries for given code snippets. Such summaries play a significant role in promoting developers to understand and maintain code. Inspired by neural…

Software Engineering · Computer Science 2024-07-03 Chunrong Fang , Weisong Sun , Yuchen Chen , Xiao Chen , Zhao Wei , Quanjun Zhang , Yudu You , Bin Luo , Yang Liu , Zhenyu Chen

Utilization of Pre-trained Language Model for Adapter-based Knowledge Transfer in Software Engineering

Software Engineering (SE) Pre-trained Language Models (PLMs), such as CodeBERT, are pre-trained on large code corpora, and their learned knowledge has shown success in transferring into downstream tasks (e.g., code clone detection) through…

Software Engineering · Computer Science 2024-02-07 Iman Saberi , Fatemeh Fard , Fuxiang Chen

An Exploratory Study on Code Attention in BERT

Many recent models in software engineering introduced deep neural models based on the Transformer architecture or use transformer-based Pre-trained Language Models (PLM) trained on code. Although these models achieve the state of the arts…

Software Engineering · Computer Science 2022-04-22 Rishab Sharma , Fuxiang Chen , Fatemeh Fard , David Lo

Semantic Source Code Segmentation using Small and Large Language Models

Source code segmentation, dividing code into functionally coherent segments, is crucial for knowledge retrieval and maintenance in software development. While enabling efficient navigation and comprehension of large codebases, manual and…

Software Engineering · Computer Science 2025-07-15 Abdelhalim Dahou , Ansgar Scherp , Sebastian Kurten , Brigitte Mathiak , Madhu Chauhan

How to get better embeddings with code pre-trained models? An empirical study

Pre-trained language models have demonstrated powerful capabilities in the field of natural language processing (NLP). Recently, code pre-trained model (PTM), which draw from the experiences of the NLP field, have also achieved…

Software Engineering · Computer Science 2023-11-15 Yu Zhao , Lina Gong , Haoxiang Zhang , Yaoshen Yu , Zhiqiu Huang

CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging

Large Language Models (LLMs) have made significant strides in code generation and problem solving. Current approaches employ external tool-based iterative debuggers that use compiler or other tool-based runtime feedback to refine coarse…

Computation and Language · Computer Science 2026-04-28 Md. Ashraful Islam , Mohammed Eunus Ali , Md Rizwan Parvez

Code2Snapshot: Using Code Snapshots for Learning Representations of Source Code

There are several approaches for encoding source code in the input vectors of neural models. These approaches attempt to include various syntactic and semantic features of input programs in their encoding. In this paper, we investigate…

Software Engineering · Computer Science 2023-02-02 Md Rafiqul Islam Rabin , Mohammad Amin Alipour

Enhancing Code Generation Performance of Smaller Models by Distilling the Reasoning Ability of LLMs

Large Language Models (LLMs) have recently made significant advances in code generation through the 'Chain-of-Thought' prompting technique. This technique empowers the model to autonomously devise "solution plans" to tackle intricate…

Software Engineering · Computer Science 2024-03-21 Zhihong Sun , Chen Lyu , Bolun Li , Yao Wan , Hongyu Zhang , Ge Li , Zhi Jin

CodeGRAG: Bridging the Gap between Natural Language and Programming Language via Graphical Retrieval Augmented Generation

Utilizing large language models to generate codes has shown promising meaning in software development revolution. Despite the intelligence shown by the large language models, their specificity in code generation can still be improved due to…

Software Engineering · Computer Science 2025-05-20 Kounianhua Du , Jizheng Chen , Renting Rui , Huacan Chai , Lingyue Fu , Wei Xia , Yasheng Wang , Ruiming Tang , Yong Yu , Weinan Zhang

Towards Leveraging Large Language Model Summaries for Topic Modeling in Source Code

Understanding source code is a topic of great interest in the software engineering community, since it can help programmers in various tasks such as software maintenance and reuse. Recent advances in large language models (LLMs) have…

Software Engineering · Computer Science 2025-04-25 Michele Carissimi , Martina Saletta , Claudio Ferretti

Code-driven Number Sequence Calculation: Enhancing the inductive Reasoning Abilities of Large Language Models

Large language models (LLMs) make remarkable progress in reasoning tasks. Among different reasoning modes, inductive reasoning, due to its better alignment with human learning, attracts increasing interest. However, research on inductive…

Computation and Language · Computer Science 2025-10-17 Kedi Chen , Zhikai Lei , Xu Guo , Xuecheng Wu , Siyuan Zeng , Jianghao Yin , Yinqi Zhang , Qin Chen , Jie Zhou , Liang He , Qipeng Guo , Kai Chen , Wei Zhang

GypSum: Learning Hybrid Representations for Code Summarization

Code summarization with deep learning has been widely studied in recent years. Current deep learning models for code summarization generally follow the principle in neural machine translation and adopt the encoder-decoder framework, where…

Software Engineering · Computer Science 2022-04-28 Yu Wang , Yu Dong , Xuesong Lu , Aoying Zhou

LLM-Assisted Code Cleaning For Training Accurate Code Generators

Natural language to code generation is an important application area of LLMs and has received wide attention from the community. The majority of relevant studies have exclusively concentrated on increasing the quantity and functional…

Machine Learning · Computer Science 2023-11-28 Naman Jain , Tianjun Zhang , Wei-Lin Chiang , Joseph E. Gonzalez , Koushik Sen , Ion Stoica

Code Semantic Zooming

Recent advances in Large Language Models (LLMs) have introduced a new paradigm for software development, where source code is generated from natural language prompts. While this paradigm significantly boosts development productivity,…

Human-Computer Interaction · Computer Science 2026-05-06 Jinsheng Ba , Sverrir Thorgeirsson , Zhendong Su

Code Summarization with Structure-induced Transformer

Code summarization (CS) is becoming a promising area in recent language understanding, which aims to generate sensible human language automatically for programming language in the format of source code, serving in the most convenience of…

Computation and Language · Computer Science 2021-06-02 Hongqiu Wu , Hai Zhao , Min Zhang