Related papers: CCT5: A Code-Change-Oriented Pre-Trained Model

CoditT5: Pretraining for Source Code and Natural Language Editing

Pretrained language models have been shown to be effective in many software-related generation tasks; however, they are not well-suited for editing tasks as they are not designed to reason about edits. To address this, we propose a novel…

Software Engineering · Computer Science 2022-09-15 Jiyang Zhang , Sheena Panthaplackel , Pengyu Nie , Junyi Jessy Li , Milos Gligoric

Probing Pretrained Models of Source Code

Deep learning models are widely used for solving challenging code processing tasks, such as code generation or code summarization. Traditionally, a specific model architecture was carefully built to solve a particular code processing task.…

Software Engineering · Computer Science 2022-11-18 Sergey Troshin , Nadezhda Chirkova

Using Pre-Trained Models to Boost Code Review Automation

Code review is a practice widely adopted in open source and industrial projects. Given the non-negligible cost of such a process, researchers started investigating the possibility of automating specific code review tasks. We recently…

Software Engineering · Computer Science 2022-01-19 Rosalia Tufano , Simone Masiero , Antonio Mastropaolo , Luca Pascarella , Denys Poshyvanyk , Gabriele Bavota

Cost-Effective Training of Deep CNNs with Active Model Adaptation

Deep convolutional neural networks have achieved great success in various applications. However, training an effective DNN model for a specific task is rather challenging because it requires a prior knowledge or experience to design the…

Machine Learning · Computer Science 2018-06-06 Sheng-Jun Huang , Jia-Wei Zhao , Zhao-Yang Liu

Bridging Pre-trained Models and Downstream Tasks for Source Code Understanding

With the great success of pre-trained models, the pretrain-then-finetune paradigm has been widely adopted on downstream tasks for source code understanding. However, compared to costly training a large-scale model from scratch, how to…

Software Engineering · Computer Science 2022-03-16 Deze Wang , Zhouyang Jia , Shanshan Li , Yue Yu , Yun Xiong , Wei Dong , Xiangke Liao

CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation

Pre-trained models for Natural Languages (NL) like BERT and GPT have been recently shown to transfer well to Programming Languages (PL) and largely benefit a broad set of code-related tasks. Despite their success, most current methods…

Computation and Language · Computer Science 2021-09-03 Yue Wang , Weishi Wang , Shafiq Joty , Steven C. H. Hoi

Studying the Usage of Text-To-Text Transfer Transformer to Support Code-Related Tasks

Deep learning (DL) techniques are gaining more and more attention in the software engineering community. They have been used to support several code-related tasks, such as automatic bug fixing and code comments generation. Recent studies in…

Software Engineering · Computer Science 2021-02-04 Antonio Mastropaolo , Simone Scalabrino , Nathan Cooper , David Nader Palacio , Denys Poshyvanyk , Rocco Oliveto , Gabriele Bavota

Automating Code Review Activities by Large-Scale Pre-training

Code review is an essential part to software development lifecycle since it aims at guaranteeing the quality of codes. Modern code review activities necessitate developers viewing, understanding and even running the programs to assess…

Software Engineering · Computer Science 2022-10-12 Zhiyu Li , Shuai Lu , Daya Guo , Nan Duan , Shailesh Jannu , Grant Jenks , Deep Majumder , Jared Green , Alexey Svyatkovskiy , Shengyu Fu , Neel Sundaresan

CodeT5+: Open Code Large Language Models for Code Understanding and Generation

Large language models (LLMs) pretrained on vast source code have achieved prominent progress in code intelligence. However, existing code LLMs have two main limitations in terms of architecture and pretraining tasks. First, they often adopt…

Computation and Language · Computer Science 2023-05-23 Yue Wang , Hung Le , Akhilesh Deepak Gotmare , Nghi D. Q. Bui , Junnan Li , Steven C. H. Hoi

Using Transfer Learning for Code-Related Tasks

Deep learning (DL) techniques have been used to support several code-related tasks such as code summarization and bug-fixing. In particular, pre-trained transformer models are on the rise, also thanks to the excellent results they achieved…

Software Engineering · Computer Science 2022-06-20 Antonio Mastropaolo , Nathan Cooper , David Nader Palacio , Simone Scalabrino , Denys Poshyvanyk , Rocco Oliveto , Gabriele Bavota

CodeEditor: Learning to Edit Source Code with Pre-trained Models

Developers often perform repetitive code editing activities for various reasons (e.g., code refactoring) during software development. Pre-trained code editing models have achieved the state-of-the-art (SOTA) results. Pre-trained models are…

Software Engineering · Computer Science 2023-09-08 Jia Li , Ge Li , Zhuo Li , Zhi Jin , Xing Hu , Kechi Zhang , Zhiyi Fu

Towards Efficient Fine-tuning of Pre-trained Code Models: An Experimental Study and Beyond

Recently, fine-tuning pre-trained code models such as CodeBERT on downstream tasks has achieved great success in many software testing and analysis tasks. While effective and prevalent, fine-tuning the pre-trained parameters incurs a large…

Software Engineering · Computer Science 2023-04-12 Ensheng Shi , Yanlin Wang , Hongyu Zhang , Lun Du , Shi Han , Dongmei Zhang , Hongbin Sun

How to Select Pre-Trained Code Models for Reuse? A Learning Perspective

Pre-training a language model and then fine-tuning it has shown to be an efficient and effective technique for a wide range of code intelligence tasks, such as code generation, code summarization, and vulnerability detection. However,…

Software Engineering · Computer Science 2025-01-08 Zhangqian Bi , Yao Wan , Zhaoyang Chu , Yufei Hu , Junyi Zhang , Hongyu Zhang , Guandong Xu , Hai Jin

NatGen: Generative pre-training by "Naturalizing" source code

Pre-trained Generative Language models (e.g. PLBART, CodeT5, SPT-Code) for source code yielded strong results on several tasks in the past few years, including code generation and translation. These models have adopted varying pre-training…

Programming Languages · Computer Science 2022-07-07 Saikat Chakraborty , Toufique Ahmed , Yangruibo Ding , Premkumar Devanbu , Baishakhi Ray

CODIT: Code Editing with Tree-Based Neural Models

The way developers edit day-to-day code tends to be repetitive, often using existing code elements. Many researchers have tried to automate repetitive code changes by learning from specific change templates which are applied to limited…

Software Engineering · Computer Science 2022-04-21 Saikat Chakraborty , Yangruibo Ding , Miltiadis Allamanis , Baishakhi Ray

Model Reprogramming: Resource-Efficient Cross-Domain Machine Learning

In data-rich domains such as vision, language, and speech, deep learning prevails to deliver high-performance task-specific models and can even learn general task-agnostic representations for efficient finetuning to downstream tasks.…

Machine Learning · Computer Science 2023-12-07 Pin-Yu Chen

CoDocBench: A Dataset for Code-Documentation Alignment in Software Maintenance

One of the central tasks in software maintenance is being able to understand and develop code changes. Thus, given a natural language description of the desired new operation of a function, an agent (human or AI) might be asked to generate…

Software Engineering · Computer Science 2025-02-05 Kunal Pai , Premkumar Devanbu , Toufique Ahmed

Should Code Models Learn Pedagogically? A Preliminary Evaluation of Curriculum Learning for Real-World Software Engineering Tasks

Learning-based techniques, especially advanced pre-trained models for code have demonstrated capabilities in code understanding and generation, solving diverse software engineering (SE) tasks. Despite the promising results, current training…

Software Engineering · Computer Science 2025-02-07 Kyi Shin Khant , Hong Yi Lin , Patanamon Thongtanunam

Structured Code Representations Enable Data-Efficient Adaptation of Code Language Models

Current language models tailored for code tasks often adopt the pre-training-then-fine-tuning paradigm from natural language processing, modeling source code as plain text. This approach, however, overlooks the unambiguous structures…

Computation and Language · Computer Science 2024-01-22 Mayank Agarwal , Yikang Shen , Bailin Wang , Yoon Kim , Jie Chen

Empirical Study on Transformer-based Techniques for Software Engineering

Many Transformer-based pre-trained models for code have been developed and applied to code-related tasks. In this paper, we review the existing literature, examine the suitability of model architectures for different tasks, and look at the…

Software Engineering · Computer Science 2023-10-03 Yan Xiao , Xinyue Zuo , Lei Xue , Kailong Wang , Jin Song Dong , Ivan Beschastnikh