Related papers: JavaBERT: Training a transformer-based model for t…

Code quality assessment using transformers

Automatically evaluate the correctness of programming assignments is rather straightforward using unit and integration tests. However, programming tasks can be solved in multiple ways, many of which, although correct, are inelegant. For…

Computation and Language · Computer Science 2023-09-19 Mosleh Mahamud , Isak Samsten

On the validity of pre-trained transformers for natural language processing in the software engineering domain

Transformers are the current state-of-the-art of natural language processing in many domains and are using traction within software engineering research as well. Such models are pre-trained on large amounts of data, usually from the general…

Software Engineering · Computer Science 2022-05-16 Julian von der Mosel , Alexander Trautsch , Steffen Herbold

JaCoText: A Pretrained Model for Java Code-Text Generation

Pretrained transformer-based models have shown high performance in natural language generation task. However, a new wave of interest has surged: automatic programming language generation. This task consists of translating natural language…

Computation and Language · Computer Science 2023-03-24 Jessica López Espejel , Mahaman Sanoussi Yahaya Alassan , Walid Dahhane , El Hassane Ettifouri

CodeBERT-nt: code naturalness via CodeBERT

Much of software-engineering research relies on the naturalness of code, the fact that code, in small code snippets, is repetitive and can be predicted using statistical language models like n-gram. Although powerful, training such models…

Software Engineering · Computer Science 2022-08-15 Ahmed Khanfir , Matthieu Jimenez , Mike Papadakis , Yves Le Traon

On the Effectiveness of Transfer Learning for Code Search

The Transformer architecture and transfer learning have marked a quantum leap in natural language processing, improving the state of the art across a range of text-based tasks. This paper examines how these advancements can be applied to…

Software Engineering · Computer Science 2022-08-29 Pasquale Salza , Christoph Schwizer , Jian Gu , Harald C. Gall

Exploring Software Naturalness through Neural Language Models

The Software Naturalness hypothesis argues that programming languages can be understood through the same techniques used in natural language processing. We explore this hypothesis through the use of a pre-trained transformer-based language…

Computation and Language · Computer Science 2020-06-25 Luca Buratti , Saurabh Pujar , Mihaela Bornea , Scott McCarley , Yunhui Zheng , Gaetano Rossiello , Alessandro Morari , Jim Laredo , Veronika Thost , Yufan Zhuang , Giacomo Domeniconi

What do pre-trained code models know about code?

Pre-trained models of code built on the transformer architecture have performed well on software engineering (SE) tasks such as predictive code generation, code summarization, among others. However, whether the vector representations from…

Software Engineering · Computer Science 2021-08-26 Anjan Karmakar , Romain Robbes

Learning code summarization from a small and local dataset

Foundation models (e.g., CodeBERT, GraphCodeBERT, CodeT5) work well for many software engineering tasks. These models are pre-trained (using self-supervision) with billions of code tokens, and then fine-tuned with hundreds of thousands of…

Software Engineering · Computer Science 2022-06-03 Toufique Ahmed , Premkumar Devanbu

Applying CodeBERT for Automated Program Repair of Java Simple Bugs

Software debugging, and program repair are among the most time-consuming and labor-intensive tasks in software engineering that would benefit a lot from automation. In this paper, we propose a novel automated program repair approach based…

Software Engineering · Computer Science 2021-04-01 Ehsan Mashhadi , Hadi Hemmati

CodeBERT: A Pre-Trained Model for Programming and Natural Languages

We present CodeBERT, a bimodal pre-trained model for programming language (PL) and nat-ural language (NL). CodeBERT learns general-purpose representations that support downstream NL-PL applications such as natural language codesearch, code…

Computation and Language · Computer Science 2020-09-21 Zhangyin Feng , Daya Guo , Duyu Tang , Nan Duan , Xiaocheng Feng , Ming Gong , Linjun Shou , Bing Qin , Ting Liu , Daxin Jiang , Ming Zhou

On the Impact of Language Selection for Training and Evaluating Programming Language Models

The recent advancements in Transformer-based Language Models have demonstrated significant potential in enhancing the multilingual capabilities of these models. The remarkable progress made in this domain not only applies to natural…

Software Engineering · Computer Science 2023-08-28 Jonathan Katzy , Maliheh Izadi , Arie van Deursen

Exploring and Evaluating Personalized Models for Code Generation

Large Transformer models achieved the state-of-the-art status for Natural Language Understanding tasks and are increasingly becoming the baseline model architecture for modeling source code. Transformers are usually pre-trained on large…

Software Engineering · Computer Science 2022-09-21 Andrei Zlotchevski , Dawn Drain , Alexey Svyatkovskiy , Colin Clement , Neel Sundaresan , Michele Tufano

Transformer-Based Language Models for Software Vulnerability Detection

The large transformer-based language models demonstrate excellent performance in natural language processing. By considering the transferability of the knowledge gained by these models in one domain to other related domains, and the…

Cryptography and Security · Computer Science 2022-09-07 Chandra Thapa , Seung Ick Jang , Muhammad Ejaz Ahmed , Seyit Camtepe , Josef Pieprzyk , Surya Nepal

On The Cross-Modal Transfer from Natural Language to Code through Adapter Modules

Pre-trained neural Language Models (PTLM), such as CodeBERT, are recently used in software engineering as models pre-trained on large source code corpora. Their knowledge is transferred to downstream tasks (e.g. code clone detection) via…

Software Engineering · Computer Science 2022-04-20 Divyam Goel , Ramansh Grover , Fatemeh H. Fard

CodeTrans: Towards Cracking the Language of Silicon's Code Through Self-Supervised Deep Learning and High Performance Computing

Currently, a growing number of mature natural language processing applications make people's life more convenient. Such applications are built by source code - the language in software engineering. However, the applications for…

Software Engineering · Computer Science 2021-05-13 Ahmed Elnaggar , Wei Ding , Llion Jones , Tom Gibbs , Tamas Feher , Christoph Angerer , Silvia Severini , Florian Matthes , Burkhard Rost

A Comprehensive Review of State-of-The-Art Methods for Java Code Generation from Natural Language Text

Java Code Generation consists in generating automatically Java code from a Natural Language Text. This NLP task helps in increasing programmers' productivity by providing them with immediate solutions to the simplest and most repetitive…

Computation and Language · Computer Science 2023-06-13 Jessica López Espejel , Mahaman Sanoussi Yahaya Alassan , El Mehdi Chouham , Walid Dahhane , El Hassane Ettifouri

STraceBERT: Source Code Retrieval using Semantic Application Traces

Software reverse engineering is an essential task in software engineering and security, but it can be a challenging process, especially for adversarial artifacts. To address this challenge, we present STraceBERT, a novel approach that…

Software Engineering · Computer Science 2023-12-11 Claudio Spiess

Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey

Large, pre-trained transformer-based language models such as BERT have drastically changed the Natural Language Processing (NLP) field. We present a survey of recent work that uses these large language models to solve NLP tasks via…

Computation and Language · Computer Science 2021-11-03 Bonan Min , Hayley Ross , Elior Sulem , Amir Pouran Ben Veyseh , Thien Huu Nguyen , Oscar Sainz , Eneko Agirre , Ilana Heinz , Dan Roth

The Diminishing Returns of Masked Language Models to Science

Transformer-based masked language models such as BERT, trained on general corpora, have shown impressive performance on downstream tasks. It has also been demonstrated that the downstream task performance of such models can be improved by…

Computation and Language · Computer Science 2023-05-04 Zhi Hong , Aswathy Ajith , Gregory Pauloski , Eamon Duede , Kyle Chard , Ian Foster

MatSciBERT: A Materials Domain Language Model for Text Mining and Information Extraction

An overwhelmingly large amount of knowledge in the materials domain is generated and stored as text published in peer-reviewed scientific literature. Recent developments in natural language processing, such as bidirectional encoder…

Computation and Language · Computer Science 2021-10-01 Tanishq Gupta , Mohd Zaki , N. M. Anoop Krishnan , Mausam