Related papers: Code quality assessment using transformers

JavaBERT: Training a transformer-based model for the Java programming language

Code quality is and will be a crucial factor while developing new software code, requiring appropriate tools to ensure functional and reliable code. Machine learning techniques are still rarely used for software engineering tools, missing…

Software Engineering · Computer Science 2021-10-22 Nelson Tavares de Sousa , Wilhelm Hasselbring

Applying CodeBERT for Automated Program Repair of Java Simple Bugs

Software debugging, and program repair are among the most time-consuming and labor-intensive tasks in software engineering that would benefit a lot from automation. In this paper, we propose a novel automated program repair approach based…

Software Engineering · Computer Science 2021-04-01 Ehsan Mashhadi , Hadi Hemmati

What do pre-trained code models know about code?

Pre-trained models of code built on the transformer architecture have performed well on software engineering (SE) tasks such as predictive code generation, code summarization, among others. However, whether the vector representations from…

Software Engineering · Computer Science 2021-08-26 Anjan Karmakar , Romain Robbes

CodeBERT-nt: code naturalness via CodeBERT

Much of software-engineering research relies on the naturalness of code, the fact that code, in small code snippets, is repetitive and can be predicted using statistical language models like n-gram. Although powerful, training such models…

Software Engineering · Computer Science 2022-08-15 Ahmed Khanfir , Matthieu Jimenez , Mike Papadakis , Yves Le Traon

Empirical Study on Transformer-based Techniques for Software Engineering

Many Transformer-based pre-trained models for code have been developed and applied to code-related tasks. In this paper, we review the existing literature, examine the suitability of model architectures for different tasks, and look at the…

Software Engineering · Computer Science 2023-10-03 Yan Xiao , Xinyue Zuo , Lei Xue , Kailong Wang , Jin Song Dong , Ivan Beschastnikh

Code Vulnerability Detection Across Different Programming Languages with AI Models

Security vulnerabilities present in a code that has been written in diverse programming languages are among the most critical yet complicated aspects of source code to detect. Static analysis tools based on rule-based patterns usually do…

Cryptography and Security · Computer Science 2025-08-19 Hael Abdulhakim Ali Humran , Ferdi Sonmez

On the Impact of Language Selection for Training and Evaluating Programming Language Models

The recent advancements in Transformer-based Language Models have demonstrated significant potential in enhancing the multilingual capabilities of these models. The remarkable progress made in this domain not only applies to natural…

Software Engineering · Computer Science 2023-08-28 Jonathan Katzy , Maliheh Izadi , Arie van Deursen

Quality Estimation & Interpretability for Code Translation

Recently, the automated translation of source code from one programming language to another by using automatic approaches inspired by Neural Machine Translation (NMT) methods for natural languages has come under study. However, such…

Software Engineering · Computer Science 2021-04-28 Mayank Agarwal , Kartik Talamadupula , Stephanie Houde , Fernando Martinez , Michael Muller , John Richards , Steven Ross , Justin D. Weisz

Learning code summarization from a small and local dataset

Foundation models (e.g., CodeBERT, GraphCodeBERT, CodeT5) work well for many software engineering tasks. These models are pre-trained (using self-supervision) with billions of code tokens, and then fine-tuned with hundreds of thousands of…

Software Engineering · Computer Science 2022-06-03 Toufique Ahmed , Premkumar Devanbu

Quality Evaluation of COBOL to Java Code Transformation

We present an automated evaluation system for assessing COBOL-to-Java code translation within IBM's watsonx Code Assistant for Z (WCA4Z). The system addresses key challenges in evaluating LLM-based translators, including model opacity and…

Software Engineering · Computer Science 2025-08-01 Shmulik Froimovich , Raviv Gal , Wesam Ibraheem , Avi Ziv

Generalizability of Code Clone Detection on CodeBERT

Transformer networks such as CodeBERT already achieve outstanding results for code clone detection in benchmark datasets, so one could assume that this task has already been solved. However, code clone detection is not a trivial task.…

Software Engineering · Computer Science 2022-09-02 Tim Sonnekalb , Bernd Gruner , Clemens-Alexander Brust , Patrick Mäder

Exploring and Evaluating Personalized Models for Code Generation

Large Transformer models achieved the state-of-the-art status for Natural Language Understanding tasks and are increasingly becoming the baseline model architecture for modeling source code. Transformers are usually pre-trained on large…

Software Engineering · Computer Science 2022-09-21 Andrei Zlotchevski , Dawn Drain , Alexey Svyatkovskiy , Colin Clement , Neel Sundaresan , Michele Tufano

Detecting Code Quality Issues in Pre-written Templates of Programming Tasks in Online Courses

In this work, we developed an algorithm for detecting code quality issues in the templates of online programming tasks, validated it, and conducted an empirical study on the dataset of student solutions. The algorithm consists of analyzing…

Software Engineering · Computer Science 2023-04-26 Anastasiia Birillo , Elizaveta Artser , Yaroslav Golubev , Maria Tigina , Hieke Keuning , Nikolay Vyahhi , Timofey Bryksin

QualiTagger: Automating software quality detection in issue trackers

A systems quality is a major concern for development teams when it evolve. Understanding the effects of a loss of quality in the codebase is crucial to avoid side effects like the appearance of technical debt. Although the identification of…

Software Engineering · Computer Science 2025-04-16 Karthik Shivashankar , Rafael Capilla , Maren Maritsdatter Kruke , Mili Orucevic , Antonio Martini

Enhancing Source Code Classification Effectiveness via Prompt Learning Incorporating Knowledge Features

Researchers have investigated the potential of leveraging pre-trained language models, such as CodeBERT, to enhance source code-related tasks. Previous methodologies have relied on CodeBERT's '[CLS]' token as the embedding representation of…

Computation and Language · Computer Science 2024-09-04 Yong Ma , Senlin Luo , Yu-Ming Shang , Yifei Zhang , Zhengjun Li

Automating the Detection of Code Vulnerabilities by Analyzing GitHub Issues

In today's digital landscape, the importance of timely and accurate vulnerability detection has significantly increased. This paper presents a novel approach that leverages transformer-based models and machine learning techniques to…

Software Engineering · Computer Science 2025-01-10 Daniele Cipollone , Changjie Wang , Mariano Scazzariello , Simone Ferlin , Maliheh Izadi , Dejan Kostic , Marco Chiesa

Evaluating software defect prediction performance: an updated benchmarking study

Accurately predicting faulty software units helps practitioners target faulty units and prioritize their efforts to maintain software quality. Prior studies use machine-learning models to detect faulty software code. We revisit past studies…

Software Engineering · Computer Science 2019-01-08 Libo Li , Stefan Lessmann , Bart Baesens

Test Code Refactoring Unveiled: Where and How Does It Affect Test Code Quality and Effectiveness?

Context. Refactoring has been widely investigated in the past in relation to production code quality, yet still little is known on how developers apply refactoring on test code. Specifically, there is still a lack of investigation into how…

Software Engineering · Computer Science 2023-08-21 Luana Martins , Valeria Pontillo , Heitor Costa , Filomena Ferrucci , Fabio Palomba , Ivan Machado

Will My Tests Tell Me If I Break This Code?

Automated tests play an important role in software evolution because they can rapidly detect faults introduced during changes. In practice, code-coverage metrics are often used as criteria to evaluate the effectiveness of test suites with…

Software Engineering · Computer Science 2016-11-23 Rainer Niedermayr , Elmar Juergens , Stefan Wagner

Assessing Dataset Quality Through Decision Tree Characteristics in Autoencoder-Processed Spaces

In this paper, we delve into the critical aspect of dataset quality assessment in machine learning classification tasks. Leveraging a variety of nine distinct datasets, each crafted for classification tasks with varying complexity levels,…

Machine Learning · Computer Science 2023-06-28 Szymon Mazurek , Maciej Wielgosz