Related papers: Multi-task Learning based Pre-trained Language Mod…

A Self-Attentional Neural Architecture for Code Completion with Multi-Task Learning

Code completion, one of the most useful features in the Integrated Development Environments (IDEs), can accelerate software development by suggesting the libraries, APIs, and method names in real-time. Recent studies have shown that…

Software Engineering · Computer Science 2020-06-29 Fang Liu , Ge Li , Bolin Wei , Xin Xia , Zhiyi Fu , Zhi Jin

Towards Full-line Code Completion with Neural Language Models

A code completion system suggests future code elements to developers given a partially-complete code snippet. Code completion is one of the most useful features in Integrated Development Environments (IDEs). Currently, most code completion…

Software Engineering · Computer Science 2020-09-21 Wenhan Wang , Sijie Shen , Ge Li , Zhi Jin

CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Naming Sequences

Code completion is an essential feature of IDEs, yet current autocompleters are restricted to either grammar-based or NLP-based single token completions. Both approaches have significant drawbacks: grammar-based autocompletion is restricted…

Software Engineering · Computer Science 2022-02-15 Maliheh Izadi , Roberta Gismondi , Georgios Gousios

Sequence Model Design for Code Completion in the Modern IDE

Code completion plays a prominent role in modern integrated development environments (IDEs). Machine learning has become ubiquitous in analogous natural language writing and search software, surfacing more relevant autocompletions and…

Software Engineering · Computer Science 2020-04-14 Gareth Ari Aye , Gail E. Kaiser

Improving Code Autocompletion with Transfer Learning

Software language models have achieved promising results predicting code completion usages, and several industry studies have described successful IDE integrations. Recently, accuracy in autocompletion prediction improved 12.8% from…

Software Engineering · Computer Science 2021-10-14 Wen Zhou , Seohyun Kim , Vijayaraghavan Murali , Gareth Ari Aye

An Empirical Study on the Usage of Transformer Models for Code Completion

Code completion aims at speeding up code writing by predicting the next code token(s) the developer is likely to write. Works in this field focused on improving the accuracy of the generated predictions, with substantial leaps forward made…

Software Engineering · Computer Science 2021-11-19 Matteo Ciniselli , Nathan Cooper , Luca Pascarella , Antonio Mastropaolo , Emad Aghajani , Denys Poshyvanyk , Massimiliano Di Penta , Gabriele Bavota

Combining Code Embedding with Static Analysis for Function-Call Completion

Code completion is an important feature of integrated development environments (IDEs). It allows developers to produce code faster, especially novice ones who are not fully familiar with APIs and others code. Previous works on code…

Software Engineering · Computer Science 2020-11-03 M. Weyssow , H. Sahraoui , B. Frénay , B. Vanderose

Learning Autocompletion from Real-World Datasets

Code completion is a popular software development tool integrated into all major IDEs. Many neural language models have achieved promising results in completion suggestion prediction on synthetic benchmarks. However, a recent study When…

Software Engineering · Computer Science 2020-11-10 Gareth Ari Aye , Seohyun Kim , Hongyu Li

Deep Learning-based Code Completion: On the Impact on Performance of Contextual Information

Code completion aims at speeding up code writing by recommending to developers the next tokens they are likely to type. Deep Learning (DL) models pushed the boundaries of code completion by redefining what these coding assistants can do: We…

Software Engineering · Computer Science 2025-01-10 Matteo Ciniselli , Luca Pascarella , Gabriele Bavota

A Transformer-Based Approach for Smart Invocation of Automatic Code Completion

Transformer-based language models are highly effective for code completion, with much research dedicated to enhancing the content of these completions. Despite their effectiveness, these models come with high operational costs and can be…

Software Engineering · Computer Science 2024-05-24 Aral de Moor , Arie van Deursen , Maliheh Izadi

Enriching Source Code with Contextual Data for Code Completion Models: An Empirical Study

Transformer-based pre-trained models have recently achieved great results in solving many software engineering tasks including automatic code completion which is a staple in a developer's toolkit. While many have striven to improve the…

Computation and Language · Computer Science 2023-04-25 Tim van Dam , Maliheh Izadi , Arie van Deursen

Code Execution with Pre-trained Language Models

Code execution is a fundamental aspect of programming language semantics that reflects the exact behavior of the code. However, most pre-trained models for code intelligence ignore the execution trace and only rely on source code and…

Programming Languages · Computer Science 2023-05-10 Chenxiao Liu , Shuai Lu , Weizhu Chen , Daxin Jiang , Alexey Svyatkovskiy , Shengyu Fu , Neel Sundaresan , Nan Duan

On the Generalizability of Deep Learning-based Code Completion Across Programming Language Versions

Code completion is a key feature of Integrated Development Environments (IDEs), aimed at predicting the next tokens a developer is likely to write, helping them write code faster and with less effort. Modern code completion approaches are…

Software Engineering · Computer Science 2024-03-25 Matteo Ciniselli , Alberto Martin-Lopez , Gabriele Bavota

Language Modeling with Learned Meta-Tokens

While modern Transformer-based language models (LMs) have achieved major success in multi-task generalization, they often struggle to capture long-range dependencies within their context window. This work introduces a novel approach using…

Computation and Language · Computer Science 2025-09-23 Alok N. Shah , Khush Gupta , Keshav Ramji , Pratik Chaudhari

Language Models for Code Completion: A Practical Evaluation

Transformer-based language models for automatic code completion have shown great promise so far, yet the evaluation of these models rarely uses real data. This study provides both quantitative and qualitative assessments of three public…

Software Engineering · Computer Science 2024-02-27 Maliheh Izadi , Jonathan Katzy , Tim van Dam , Marc Otten , Razvan Mihai Popescu , Arie van Deursen

Do Pre-trained Language Models Indeed Understand Software Engineering Tasks?

Artificial intelligence (AI) for software engineering (SE) tasks has recently achieved promising performance. In this paper, we investigate to what extent the pre-trained language model truly understands those SE tasks such as code search,…

Software Engineering · Computer Science 2022-11-22 Yao Li , Tao Zhang , Xiapu Luo , Haipeng Cai , Sen Fang , Dawei Yuan

Are Multilingual Models Effective in Code-Switching?

Multilingual language models have shown decent performance in multilingual and cross-lingual natural language understanding tasks. However, the power of these multilingual models in code-switching tasks has not been fully explored. In this…

Computation and Language · Computer Science 2021-03-25 Genta Indra Winata , Samuel Cahyawijaya , Zihan Liu , Zhaojiang Lin , Andrea Madotto , Pascale Fung

Enhancing Linguistic Competence of Language Models through Pre-training with Language Learning Tasks

Language models (LMs) are pre-trained on raw text datasets to generate text sequences token-by-token. While this approach facilitates the learning of world knowledge and reasoning, it does not explicitly optimize for linguistic competence.…

Computation and Language · Computer Science 2026-04-17 Atsuki Yamaguchi , Maggie Mi , Nikolaos Aletras

Is Next Token Prediction Sufficient for GPT? Exploration on Code Logic Comprehension

Large language models (LLMs) has experienced exponential growth, they demonstrate remarkable performance across various tasks. Notwithstanding, contemporary research primarily centers on enhancing the size and quality of pretraining data,…

Programming Languages · Computer Science 2024-04-16 Mengnan Qi , Yufan Huang , Yongqiang Yao , Maoquan Wang , Bin Gu , Neel Sundaresan

TRACED: Execution-aware Pre-training for Source Code

Most existing pre-trained language models for source code focus on learning the static code text, typically augmented with static code structures (abstract syntax tree, dependency graphs, etc.). However, program semantics will not be fully…

Software Engineering · Computer Science 2023-06-14 Yangruibo Ding , Ben Steenhoek , Kexin Pei , Gail Kaiser , Wei Le , Baishakhi Ray