Related papers: CodeArt: Better Code Models by Attention Regulariz…

GraphCodeBERT: Pre-training Code Representations with Data Flow

Pre-trained models for programming language have achieved dramatic empirical improvements on a variety of code-related tasks such as code search, code completion, code summarization, etc. However, existing pre-trained models regard a code…

Software Engineering · Computer Science 2021-09-14 Daya Guo , Shuo Ren , Shuai Lu , Zhangyin Feng , Duyu Tang , Shujie Liu , Long Zhou , Nan Duan , Alexey Svyatkovskiy , Shengyu Fu , Michele Tufano , Shao Kun Deng , Colin Clement , Dawn Drain , Neel Sundaresan , Jian Yin , Daxin Jiang , Ming Zhou

Efficient pre-training objectives for Transformers

The Transformer architecture deeply changed the natural language processing, outperforming all previous state-of-the-art models. However, well-known Transformer models like BERT, RoBERTa, and GPT-2 require a huge compute budget to create a…

Computation and Language · Computer Science 2021-04-21 Luca Di Liello , Matteo Gabburo , Alessandro Moschitti

What Do They Capture? -- A Structural Analysis of Pre-Trained Language Models for Source Code

Recently, many pre-trained language models for source code have been proposed to model the context of code and serve as a basis for downstream code intelligence tasks such as code completion, code search, and code summarization. These…

Software Engineering · Computer Science 2022-02-15 Yao Wan , Wei Zhao , Hongyu Zhang , Yulei Sui , Guandong Xu , Hai Jin

Naturalness of Attention: Revisiting Attention in Code Language Models

Language models for code such as CodeBERT offer the capability to learn advanced source code representation, but their opacity poses barriers to understanding of captured properties. Recent attention analysis studies provide initial…

Software Engineering · Computer Science 2023-11-23 Mootez Saad , Tushar Sharma

Bridging Pre-trained Models and Downstream Tasks for Source Code Understanding

With the great success of pre-trained models, the pretrain-then-finetune paradigm has been widely adopted on downstream tasks for source code understanding. However, compared to costly training a large-scale model from scratch, how to…

Software Engineering · Computer Science 2022-03-16 Deze Wang , Zhouyang Jia , Shanshan Li , Yue Yu , Yun Xiong , Wei Dong , Xiangke Liao

CodeEditor: Learning to Edit Source Code with Pre-trained Models

Developers often perform repetitive code editing activities for various reasons (e.g., code refactoring) during software development. Pre-trained code editing models have achieved the state-of-the-art (SOTA) results. Pre-trained models are…

Software Engineering · Computer Science 2023-09-08 Jia Li , Ge Li , Zhuo Li , Zhi Jin , Xing Hu , Kechi Zhang , Zhiyi Fu

Diet Code Is Healthy: Simplifying Programs for Pre-trained Models of Code

Pre-trained code representation models such as CodeBERT have demonstrated superior performance in a variety of software engineering tasks, yet they are often heavy in complexity, quadratically with the length of the input sequence. Our…

Software Engineering · Computer Science 2022-11-22 Zhaowei Zhang , Hongyu Zhang , Beijun Shen , Xiaodong Gu

StructFormer: Document Structure-based Masked Attention and its Impact on Language Model Pre-Training

Most state-of-the-art techniques for Language Models (LMs) today rely on transformer-based architectures and their ubiquitous attention mechanism. However, the exponential growth in computational requirements with longer input sequences…

Computation and Language · Computer Science 2024-11-26 Kaustubh Ponkshe , Venkatapathy Subramanian , Natwar Modani , Ganesh Ramakrishnan

What do pre-trained code models know about code?

Pre-trained models of code built on the transformer architecture have performed well on software engineering (SE) tasks such as predictive code generation, code summarization, among others. However, whether the vector representations from…

Software Engineering · Computer Science 2021-08-26 Anjan Karmakar , Romain Robbes

The Diminishing Returns of Masked Language Models to Science

Transformer-based masked language models such as BERT, trained on general corpora, have shown impressive performance on downstream tasks. It has also been demonstrated that the downstream task performance of such models can be improved by…

Computation and Language · Computer Science 2023-05-04 Zhi Hong , Aswathy Ajith , Gregory Pauloski , Eamon Duede , Kyle Chard , Ian Foster

Meta-learning autoencoders for few-shot prediction

Compared to humans, machine learning models generally require significantly more training examples and fail to extrapolate from experience to solve previously unseen challenges. To help close this performance gap, we augment single-task…

Machine Learning · Computer Science 2018-07-27 Tailin Wu , John Peurifoy , Isaac L. Chuang , Max Tegmark

CAT-probing: A Metric-based Approach to Interpret How Pre-trained Models for Programming Language Attend Code Structure

Code pre-trained models (CodePTMs) have recently demonstrated significant success in code intelligence. To interpret these models, some probing methods have been applied. However, these methods fail to consider the inherent characteristics…

Software Engineering · Computer Science 2022-12-13 Nuo Chen , Qiushi Sun , Renyu Zhu , Xiang Li , Xuesong Lu , Ming Gao

INSPECT: Intrinsic and Systematic Probing Evaluation for Code Transformers

Pre-trained models of source code have recently been successfully applied to a wide variety of Software Engineering tasks; they have also seen some practical adoption in practice, e.g. for code completion. Yet, we still know very little…

Software Engineering · Computer Science 2023-12-11 Anjan Karmakar , Romain Robbes

On the Surprising Effectiveness of Attention Transfer for Vision Transformers

Conventional wisdom suggests that pre-training Vision Transformers (ViT) improves downstream performance by learning useful representations. Is this actually true? We investigate this question and find that the features and representations…

Machine Learning · Computer Science 2024-11-15 Alexander C. Li , Yuandong Tian , Beidi Chen , Deepak Pathak , Xinlei Chen

An Exploratory Study on Code Attention in BERT

Many recent models in software engineering introduced deep neural models based on the Transformer architecture or use transformer-based Pre-trained Language Models (PLM) trained on code. Although these models achieve the state of the arts…

Software Engineering · Computer Science 2022-04-22 Rishab Sharma , Fuxiang Chen , Fatemeh Fard , David Lo

Enriching Source Code with Contextual Data for Code Completion Models: An Empirical Study

Transformer-based pre-trained models have recently achieved great results in solving many software engineering tasks including automatic code completion which is a staple in a developer's toolkit. While many have striven to improve the…

Computation and Language · Computer Science 2023-04-25 Tim van Dam , Maliheh Izadi , Arie van Deursen

On the Effectiveness of Transfer Learning for Code Search

The Transformer architecture and transfer learning have marked a quantum leap in natural language processing, improving the state of the art across a range of text-based tasks. This paper examines how these advancements can be applied to…

Software Engineering · Computer Science 2022-08-29 Pasquale Salza , Christoph Schwizer , Jian Gu , Harald C. Gall

Preconditioned Attention: Enhancing Efficiency in Transformers

Central to the success of Transformers is the attention block, which effectively models global dependencies among input tokens associated to a dataset. However, we theoretically demonstrate that standard attention mechanisms in transformers…

Machine Learning · Computer Science 2026-03-31 Hemanth Saratchandran

Empirical Study on Transformer-based Techniques for Software Engineering

Many Transformer-based pre-trained models for code have been developed and applied to code-related tasks. In this paper, we review the existing literature, examine the suitability of model architectures for different tasks, and look at the…

Software Engineering · Computer Science 2023-10-03 Yan Xiao , Xinyue Zuo , Lei Xue , Kailong Wang , Jin Song Dong , Ivan Beschastnikh

What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?

Large pretrained Transformer language models have been shown to exhibit zero-shot generalization, i.e. they can perform a wide variety of tasks that they were not explicitly trained on. However, the architectures and pretraining objectives…

Computation and Language · Computer Science 2022-04-13 Thomas Wang , Adam Roberts , Daniel Hesslow , Teven Le Scao , Hyung Won Chung , Iz Beltagy , Julien Launay , Colin Raffel