Related papers: Improving Code Autocompletion with Transfer Learni…

Learning Autocompletion from Real-World Datasets

Code completion is a popular software development tool integrated into all major IDEs. Many neural language models have achieved promising results in completion suggestion prediction on synthetic benchmarks. However, a recent study When…

Software Engineering · Computer Science 2020-11-10 Gareth Ari Aye , Seohyun Kim , Hongyu Li

Multi-task Learning based Pre-trained Language Model for Code Completion

Code completion is one of the most useful features in the Integrated Development Environments (IDEs), which can accelerate software development by suggesting the next probable token based on the contextual code in real-time. Recent studies…

Software Engineering · Computer Science 2021-01-01 Fang Liu , Ge Li , Yunfei Zhao , Zhi Jin

Language Models for Code Completion: A Practical Evaluation

Transformer-based language models for automatic code completion have shown great promise so far, yet the evaluation of these models rarely uses real data. This study provides both quantitative and qualitative assessments of three public…

Software Engineering · Computer Science 2024-02-27 Maliheh Izadi , Jonathan Katzy , Tim van Dam , Marc Otten , Razvan Mihai Popescu , Arie van Deursen

Towards Full-line Code Completion with Neural Language Models

A code completion system suggests future code elements to developers given a partially-complete code snippet. Code completion is one of the most useful features in Integrated Development Environments (IDEs). Currently, most code completion…

Software Engineering · Computer Science 2020-09-21 Wenhan Wang , Sijie Shen , Ge Li , Zhi Jin

A Transformer-Based Approach for Smart Invocation of Automatic Code Completion

Transformer-based language models are highly effective for code completion, with much research dedicated to enhancing the content of these completions. Despite their effectiveness, these models come with high operational costs and can be…

Software Engineering · Computer Science 2024-05-24 Aral de Moor , Arie van Deursen , Maliheh Izadi

Automating Code-Related Tasks Through Transformers: The Impact of Pre-training

Transformers have gained popularity in the software engineering (SE) literature. These deep learning models are usually pre-trained through a self-supervised objective, meant to provide the model with basic knowledge about a language of…

Software Engineering · Computer Science 2023-02-09 Rosalia Tufano , Luca Pascarella , Gabriele Bavota

CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Naming Sequences

Code completion is an essential feature of IDEs, yet current autocompleters are restricted to either grammar-based or NLP-based single token completions. Both approaches have significant drawbacks: grammar-based autocompletion is restricted…

Software Engineering · Computer Science 2022-02-15 Maliheh Izadi , Roberta Gismondi , Georgios Gousios

Enriching Source Code with Contextual Data for Code Completion Models: An Empirical Study

Transformer-based pre-trained models have recently achieved great results in solving many software engineering tasks including automatic code completion which is a staple in a developer's toolkit. While many have striven to improve the…

Computation and Language · Computer Science 2023-04-25 Tim van Dam , Maliheh Izadi , Arie van Deursen

Code Pretraining Improves Entity Tracking Abilities of Language Models

Recent work has provided indirect evidence that pretraining language models on code improves the ability of models to track state changes of discourse entities expressed in natural language. In this work, we systematically test this claim…

Computation and Language · Computer Science 2024-06-03 Najoung Kim , Sebastian Schuster , Shubham Toshniwal

Masked Self-Supervised Pre-Training for Text Recognition Transformers on Large-Scale Datasets

Self-supervised learning has emerged as a powerful approach for leveraging large-scale unlabeled data to improve model performance in various domains. In this paper, we explore masked self-supervised pre-training for text recognition…

Computer Vision and Pattern Recognition · Computer Science 2025-03-31 Martin Kišš , Michal Hradiš

Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems

Language models have demonstrated remarkable performance in solving reasoning tasks; however, even the strongest models still occasionally make reasoning mistakes. Recently, there has been active research aimed at improving reasoning…

Computation and Language · Computer Science 2024-08-30 Tian Ye , Zicheng Xu , Yuanzhi Li , Zeyuan Allen-Zhu

On the Generalizability of Deep Learning-based Code Completion Across Programming Language Versions

Code completion is a key feature of Integrated Development Environments (IDEs), aimed at predicting the next tokens a developer is likely to write, helping them write code faster and with less effort. Modern code completion approaches are…

Software Engineering · Computer Science 2024-03-25 Matteo Ciniselli , Alberto Martin-Lopez , Gabriele Bavota

Predictions For Pre-training Language Models

Language model pre-training has proven to be useful in many language understanding tasks. In this paper, we investigate whether it is still helpful to add the self-training method in the pre-training step and the fine-tuning step. Towards…

Computation and Language · Computer Science 2023-02-17 Tong Guo

The Effectiveness of Intermediate-Task Training for Code-Switched Natural Language Understanding

While recent benchmarks have spurred a lot of new work on improving the generalization of pretrained multilingual language models on multilingual tasks, techniques to improve code-switched natural language understanding tasks have been far…

Computation and Language · Computer Science 2021-07-22 Archiki Prasad , Mohammad Ali Rehan , Shreya Pathak , Preethi Jyothi

Transductive Auxiliary Task Self-Training for Neural Multi-Task Models

Multi-task learning and self-training are two common ways to improve a machine learning model's performance in settings with limited training data. Drawing heavily on ideas from those two approaches, we suggest transductive auxiliary task…

Computation and Language · Computer Science 2019-09-24 Johannes Bjerva , Katharina Kann , Isabelle Augenstein

Unsupervised Learning of General-Purpose Embeddings for Code Changes

Applying machine learning to tasks that operate with code changes requires their numerical representation. In this work, we propose an approach for obtaining such representations during pre-training and evaluate them on two different…

Software Engineering · Computer Science 2021-07-12 Mikhail Pravilov , Egor Bogomolov , Yaroslav Golubev , Timofey Bryksin

How Does Code Pretraining Affect Language Model Task Performance?

Large language models are increasingly trained on corpora containing both natural language and non-linguistic data like source code. Aside from aiding programming-related tasks, anecdotal evidence suggests that including code in pretraining…

Computation and Language · Computer Science 2025-02-26 Jackson Petty , Sjoerd van Steenkiste , Tal Linzen

Sequence Model Design for Code Completion in the Modern IDE

Code completion plays a prominent role in modern integrated development environments (IDEs). Machine learning has become ubiquitous in analogous natural language writing and search software, surfacing more relevant autocompletions and…

Software Engineering · Computer Science 2020-04-14 Gareth Ari Aye , Gail E. Kaiser

Context Composing for Full Line Code Completion

Code Completion is one of the most used Integrated Development Environment (IDE) features, which affects the everyday life of a software developer. Modern code completion approaches moved from the composition of several static…

Software Engineering · Computer Science 2024-02-15 Anton Semenkin , Yaroslav Sokolov , Evgeniia Vu

Improving Cross-Lingual Reading Comprehension with Self-Training

Substantial improvements have been made in machine reading comprehension, where the machine answers questions based on a given context. Current state-of-the-art models even surpass human performance on several benchmarks. However, their…

Computation and Language · Computer Science 2021-05-11 Wei-Cheng Huang , Chien-yu Huang , Hung-yi Lee