Related papers: CodeEditor: Learning to Edit Source Code with Pre-…

Code Execution with Pre-trained Language Models

Code execution is a fundamental aspect of programming language semantics that reflects the exact behavior of the code. However, most pre-trained models for code intelligence ignore the execution trace and only rely on source code and…

Programming Languages · Computer Science 2023-05-10 Chenxiao Liu , Shuai Lu , Weizhu Chen , Daxin Jiang , Alexey Svyatkovskiy , Shengyu Fu , Neel Sundaresan , Nan Duan

Coeditor: Leveraging Contextual Changes for Multi-round Code Auto-editing

Developers often dedicate significant time to maintaining and refactoring existing code. However, most prior work on generative models for code focuses solely on creating new code, overlooking the distinctive needs of editing existing code.…

Software Engineering · Computer Science 2024-04-30 Jiayi Wei , Greg Durrett , Isil Dillig

CoditT5: Pretraining for Source Code and Natural Language Editing

Pretrained language models have been shown to be effective in many software-related generation tasks; however, they are not well-suited for editing tasks as they are not designed to reason about edits. To address this, we propose a novel…

Software Engineering · Computer Science 2022-09-15 Jiyang Zhang , Sheena Panthaplackel , Pengyu Nie , Junyi Jessy Li , Milos Gligoric

Automating Code Review Activities by Large-Scale Pre-training

Code review is an essential part to software development lifecycle since it aims at guaranteeing the quality of codes. Modern code review activities necessitate developers viewing, understanding and even running the programs to assess…

Software Engineering · Computer Science 2022-10-12 Zhiyu Li , Shuai Lu , Daya Guo , Nan Duan , Shailesh Jannu , Grant Jenks , Deep Majumder , Jared Green , Alexey Svyatkovskiy , Shengyu Fu , Neel Sundaresan

Code Generation Tools (Almost) for Free? A Study of Few-Shot, Pre-Trained Language Models on Code

Few-shot learning with large-scale, pre-trained language models is a powerful way to answer questions about code, e.g., how to complete a given code example, or even generate code snippets from scratch. The success of these models raises…

Software Engineering · Computer Science 2022-06-14 Patrick Bareiß , Beatriz Souza , Marcelo d'Amorim , Michael Pradel

Bridging Pre-trained Models and Downstream Tasks for Source Code Understanding

With the great success of pre-trained models, the pretrain-then-finetune paradigm has been widely adopted on downstream tasks for source code understanding. However, compared to costly training a large-scale model from scratch, how to…

Software Engineering · Computer Science 2022-03-16 Deze Wang , Zhouyang Jia , Shanshan Li , Yue Yu , Yun Xiong , Wei Dong , Xiangke Liao

CCT5: A Code-Change-Oriented Pre-Trained Model

Software is constantly changing, requiring developers to perform several derived tasks in a timely manner, such as writing a description for the intention of the code change, or identifying the defect-prone code changes. Considering that…

Software Engineering · Computer Science 2023-05-19 Bo Lin , Shangwen Wang , Zhongxin Liu , Yepang Liu , Xin Xia , Xiaoguang Mao

EditLord: Learning Code Transformation Rules for Code Editing

Code editing is a foundational task in software development, where its effectiveness depends on whether it introduces desired code property changes without changing the original code's intended functionality. Existing approaches often…

Software Engineering · Computer Science 2025-07-11 Weichen Li , Albert Jan , Baishakhi Ray , Junfeng Yang , Chengzhi Mao , Kexin Pei

INSPECT: Intrinsic and Systematic Probing Evaluation for Code Transformers

Pre-trained models of source code have recently been successfully applied to a wide variety of Software Engineering tasks; they have also seen some practical adoption in practice, e.g. for code completion. Yet, we still know very little…

Software Engineering · Computer Science 2023-12-11 Anjan Karmakar , Romain Robbes

What Do They Capture? -- A Structural Analysis of Pre-Trained Language Models for Source Code

Recently, many pre-trained language models for source code have been proposed to model the context of code and serve as a basis for downstream code intelligence tasks such as code completion, code search, and code summarization. These…

Software Engineering · Computer Science 2022-02-15 Yao Wan , Wei Zhao , Hongyu Zhang , Yulei Sui , Guandong Xu , Hai Jin

CODIT: Code Editing with Tree-Based Neural Models

The way developers edit day-to-day code tends to be repetitive, often using existing code elements. Many researchers have tried to automate repetitive code changes by learning from specific change templates which are applied to limited…

Software Engineering · Computer Science 2022-04-21 Saikat Chakraborty , Yangruibo Ding , Miltiadis Allamanis , Baishakhi Ray

CodeArt: Better Code Models by Attention Regularization When Symbols Are Lacking

Transformer based code models have impressive performance in many software engineering tasks. However, their effectiveness degrades when symbols are missing or not informative. The reason is that the model may not learn to pay attention to…

Software Engineering · Computer Science 2024-11-22 Zian Su , Xiangzhe Xu , Ziyang Huang , Zhuo Zhang , Yapeng Ye , Jianjun Huang , Xiangyu Zhang

InstructCoder: Instruction Tuning Large Language Models for Code Editing

Code editing encompasses a variety of pragmatic tasks that developers deal with daily. Despite its relevance and practical usefulness, automatic code editing remains an underexplored area in the evolution of deep learning models, partly due…

Computation and Language · Computer Science 2024-02-29 Kaixin Li , Qisheng Hu , Xu Zhao , Hui Chen , Yuxi Xie , Tiedong Liu , Qizhe Xie , Junxian He

What do pre-trained code models know about code?

Pre-trained models of code built on the transformer architecture have performed well on software engineering (SE) tasks such as predictive code generation, code summarization, among others. However, whether the vector representations from…

Software Engineering · Computer Science 2021-08-26 Anjan Karmakar , Romain Robbes

Towards Efficient Fine-tuning of Pre-trained Code Models: An Experimental Study and Beyond

Recently, fine-tuning pre-trained code models such as CodeBERT on downstream tasks has achieved great success in many software testing and analysis tasks. While effective and prevalent, fine-tuning the pre-trained parameters incurs a large…

Software Engineering · Computer Science 2023-04-12 Ensheng Shi , Yanlin Wang , Hongyu Zhang , Lun Du , Shi Han , Dongmei Zhang , Hongbin Sun

Directional Diffusion-Style Code Editing Pre-training

Code pre-trained models have shown promising effectiveness in various software engineering tasks. Among these tasks, many tasks are related to software evolution and/or code editing. However, existing code pre-trained models often overlook…

Software Engineering · Computer Science 2025-12-11 Qingyuan Liang , Zeyu Sun , Qihao Zhu , Junhao Hu , Yifan Zhao , Yizhou Chen , Mingxuan Zhu , Guoqing Wang , Lu Zhang

Structured Code Representations Enable Data-Efficient Adaptation of Code Language Models

Current language models tailored for code tasks often adopt the pre-training-then-fine-tuning paradigm from natural language processing, modeling source code as plain text. This approach, however, overlooks the unambiguous structures…

Computation and Language · Computer Science 2024-01-22 Mayank Agarwal , Yikang Shen , Bailin Wang , Yoon Kim , Jie Chen

Neural Networks for Modeling Source Code Edits

Programming languages are emerging as a challenging and interesting domain for machine learning. A core task, which has received significant attention in recent years, is building generative models of source code. However, to our knowledge,…

Machine Learning · Computer Science 2019-04-08 Rui Zhao , David Bieber , Kevin Swersky , Daniel Tarlow

No More Fine-Tuning? An Experimental Evaluation of Prompt Tuning in Code Intelligence

Pre-trained models have been shown effective in many code intelligence tasks. These models are pre-trained on large-scale unlabeled corpus and then fine-tuned in downstream tasks. However, as the inputs to pre-training and downstream tasks…

Software Engineering · Computer Science 2022-07-26 Chaozheng Wang , Yuanhang Yang , Cuiyun Gao , Yun Peng , Hongyu Zhang , Michael R. Lyu

SPT-Code: Sequence-to-Sequence Pre-Training for Learning Source Code Representations

Recent years have seen the successful application of large pre-trained models to code representation learning, resulting in substantial improvements on many code-related downstream tasks. But there are issues surrounding their application…

Software Engineering · Computer Science 2022-05-26 Changan Niu , Chuanyi Li , Vincent Ng , Jidong Ge , Liguo Huang , Bin Luo