Related papers: Language Modelling for Source Code with Transforme…

Maybe Deep Neural Networks are the Best Choice for Modeling Source Code

Statistical language modeling techniques have successfully been applied to source code, yielding a variety of new software development tools, such as tools for code suggestion and improving readability. A major issue with these techniques…

Software Engineering · Computer Science 2019-03-15 Rafael-Michael Karampatsis , Charles Sutton

Exploring Software Naturalness through Neural Language Models

The Software Naturalness hypothesis argues that programming languages can be understood through the same techniques used in natural language processing. We explore this hypothesis through the use of a pre-trained transformer-based language…

Computation and Language · Computer Science 2020-06-25 Luca Buratti , Saurabh Pujar , Mihaela Bornea , Scott McCarley , Yunhui Zheng , Gaetano Rossiello , Alessandro Morari , Jim Laredo , Veronika Thost , Yufan Zhuang , Giacomo Domeniconi

Empirical Study of Transformers for Source Code

Initially developed for natural language processing (NLP), Transformers are now widely used for source code processing, due to the format similarity between source code and text. In contrast to natural language, source code is strictly…

Machine Learning · Computer Science 2021-06-25 Nadezhda Chirkova , Sergey Troshin

Transformer-Based Language Models for Software Vulnerability Detection

The large transformer-based language models demonstrate excellent performance in natural language processing. By considering the transferability of the knowledge gained by these models in one domain to other related domains, and the…

Cryptography and Security · Computer Science 2022-09-07 Chandra Thapa , Seung Ick Jang , Muhammad Ejaz Ahmed , Seyit Camtepe , Josef Pieprzyk , Surya Nepal

Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code

In this work we systematically review the recent advancements in software engineering with language models, covering 70+ models, 40+ evaluation tasks, 180+ datasets, and 900 related works. Unlike previous works, we integrate software…

Computation and Language · Computer Science 2024-06-27 Ziyin Zhang , Chaoyu Chen , Bingchang Liu , Cong Liao , Zi Gong , Hang Yu , Jianguo Li , Rui Wang

Large Language Models (LLMs) for Source Code Analysis: applications, models and datasets

Large language models (LLMs) and transformer-based architectures are increasingly utilized for source code analysis. As software systems grow in complexity, integrating LLMs into code analysis workflows becomes essential for enhancing…

Software Engineering · Computer Science 2025-03-25 Hamed Jelodar , Mohammad Meymani , Roozbeh Razavi-Far

A Survey on Natural Language Processing for Programming

Natural language processing for programming aims to use NLP techniques to assist programming. It is increasingly prevalent for its effectiveness in improving productivity. Distinct from natural language, a programming language is highly…

Computation and Language · Computer Science 2023-08-08 Qingfu Zhu , Xianzhen Luo , Fang Liu , Cuiyun Gao , Wanxiang Che

Exploring Large Language Models for Code Explanation

Automating code documentation through explanatory text can prove highly beneficial in code understanding. Large Language Models (LLMs) have made remarkable strides in Natural Language Processing, especially within software engineering tasks…

Software Engineering · Computer Science 2023-10-26 Paheli Bhattacharya , Manojit Chakraborty , Kartheek N S N Palepu , Vikas Pandey , Ishan Dindorkar , Rakesh Rajpurohit , Rishabh Gupta

A Survey of Machine Learning for Big Code and Naturalness

Research at the intersection of machine learning, programming languages, and software engineering has recently taken important steps in proposing learnable probabilistic models of source code that exploit code's abundance of patterns. In…

Software Engineering · Computer Science 2018-05-08 Miltiadis Allamanis , Earl T. Barr , Premkumar Devanbu , Charles Sutton

A Comparative Study on Code Generation with Transformers

In an era of widespread influence of Natural Language Processing (NLP), there have been multiple research efforts to supplant traditional manual coding techniques with automated systems capable of generating solutions autonomously. With…

Computation and Language · Computer Science 2024-12-10 Namrata Das , Rakshya Panta , Neelam Karki , Ruchi Manandhar , Dinesh Baniya Kshatri

Neural Models for Source Code Synthesis and Completion

Natural language (NL) to code suggestion systems assist developers in Integrated Development Environments (IDEs) by translating NL utterances into compilable code snippet. The current approaches mainly involve hard-coded, rule-based systems…

Software Engineering · Computer Science 2024-02-13 Mitodru Niyogi

Automated Source Code Generation and Auto-completion Using Deep Learning: Comparing and Discussing Current Language-Model-Related Approaches

In recent years, the use of deep learning in language models gained much attention. Some research projects claim that they can generate text that can be interpreted as human-writing, enabling new possibilities in many application areas.…

Computation and Language · Computer Science 2021-01-13 Juan Cruz-Benito , Sanjay Vishwakarma , Francisco Martin-Fernandez , Ismael Faro

Large Language Models are not Models of Natural Language: they are Corpus Models

Natural Language Processing (NLP) has become one of the leading application areas in the current Artificial Intelligence boom. Transfer learning has enabled large deep learning neural networks trained on the language modeling task to vastly…

Computation and Language · Computer Science 2022-06-16 Csaba Veres

Quality Estimation & Interpretability for Code Translation

Recently, the automated translation of source code from one programming language to another by using automatic approaches inspired by Neural Machine Translation (NMT) methods for natural languages has come under study. However, such…

Software Engineering · Computer Science 2021-04-28 Mayank Agarwal , Kartik Talamadupula , Stephanie Houde , Fernando Martinez , Michael Muller , John Richards , Steven Ross , Justin D. Weisz

Deep Learning for Source Code Modeling and Generation: Models, Applications and Challenges

Deep Learning (DL) techniques for Natural Language Processing have been evolving remarkably fast. Recently, the DL advances in language modeling, machine translation and paragraph understanding are so prominent that the potential of DL in…

Software Engineering · Computer Science 2020-06-16 Triet H. M. Le , Hao Chen , M. Ali Babar

Specification-Driven Code Translation Powered by Large Language Models: How Far Are We?

Large Language Models (LLMs) are increasingly being applied across various domains, including code-related tasks such as code translation. Previous studies have explored using LLMs for translating code between different programming…

Software Engineering · Computer Science 2026-05-05 Soumit Kanti Saha , Fazle Rabbi , Song Wang , Jinqiu Yang

Towards Leveraging Large Language Model Summaries for Topic Modeling in Source Code

Understanding source code is a topic of great interest in the software engineering community, since it can help programmers in various tasks such as software maintenance and reuse. Recent advances in large language models (LLMs) have…

Software Engineering · Computer Science 2025-04-25 Michele Carissimi , Martina Saletta , Claudio Ferretti

Using LSTMs to Model the Java Programming Language

Recurrent neural networks (RNNs), specifically long-short term memory networks (LSTMs), can model natural language effectively. This research investigates the ability for these same LSTMs to perform next "word" prediction on the Java…

Software Engineering · Computer Science 2019-09-02 Brendon Boldt

A Survey on Large Language Models from Concept to Implementation

Recent advancements in Large Language Models (LLMs), particularly those built on Transformer architectures, have significantly broadened the scope of natural language processing (NLP) applications, transcending their initial use in chatbot…

Computation and Language · Computer Science 2024-05-29 Chen Wang , Jin Zhao , Jiaqi Gong

LLM-Assisted Code Cleaning For Training Accurate Code Generators

Natural language to code generation is an important application area of LLMs and has received wide attention from the community. The majority of relevant studies have exclusively concentrated on increasing the quantity and functional…

Machine Learning · Computer Science 2023-11-28 Naman Jain , Tianjun Zhang , Wei-Lin Chiang , Joseph E. Gonzalez , Koushik Sen , Ion Stoica