Related papers: Evaluating Source Code Quality with Large Language…

Assessing the Quality and Security of AI-Generated Code: A Quantitative Analysis

This study presents a quantitative evaluation of the code quality and security of five prominent Large Language Models (LLMs): Claude Sonnet 4, Claude 3.7 Sonnet, GPT-4o, Llama 3.2 90B, and OpenCoder 8B. While prior research has assessed…

Software Engineering · Computer Science 2025-08-21 Abbas Sabra , Olivier Schmitt , Joseph Tyler

Frustrated with Code Quality Issues? LLMs can Help!

As software projects progress, quality of code assumes paramount importance as it affects reliability, maintainability and security of software. For this reason, static analysis tools are used in developer workflows to flag code quality…

Artificial Intelligence · Computer Science 2023-09-25 Nalin Wadhwa , Jui Pradhan , Atharv Sonwane , Surya Prakash Sahu , Nagarajan Natarajan , Aditya Kanade , Suresh Parthasarathy , Sriram Rajamani

Is LLM-Generated Code More Maintainable \& Reliable than Human-Written Code?

Background: The rise of Large Language Models (LLMs) in software development has opened new possibilities for code generation. Despite the widespread use of this technology, it remains unclear how well LLMs generate code solutions in terms…

Software Engineering · Computer Science 2025-08-04 Alfred Santa Molison , Marcia Moraes , Glaucia Melo , Fabio Santos , Wesley K. G. Assuncao

Measuring Determinism in Large Language Models for Software Code Review

Large Language Models (LLMs) promise to streamline software code reviews, but their ability to produce consistent assessments remains an open question. In this study, we tested four leading LLMs -- GPT-4o mini, GPT-4o, Claude 3.5 Sonnet,…

Software Engineering · Computer Science 2025-03-03 Eugene Klishevich , Yegor Denisov-Blanch , Simon Obstbaum , Igor Ciobanu , Michal Kosinski

Exploring Code Analysis: Zero-Shot Insights on Syntax and Semantics with LLMs

Code analysis is fundamental in Software Engineering, supporting debugging, optimization, and security assessment. Human developers approach it through syntax parsing, static semantics inference, and dynamic reasoning. Traditional tools are…

Software Engineering · Computer Science 2026-05-22 Wei Ma , Zhihao Lin , Shangqing Liu , Qiang Hu , Ye Liu , Wenhan Wang , Cen Zhang , Liming Nie , Li Li , Yang Liu , Lingxiao Jiang

Measuring how changes in code readability attributes affect code quality evaluation by Large Language Models

Code readability is one of the main aspects of code quality, influenced by various properties like identifier names, comments, code structure, and adherence to standards. However, measuring this attribute poses challenges in both industry…

Software Engineering · Computer Science 2025-10-21 Igor Regis da Silva Simoes , Elaine Venson

Static Analysis as a Feedback Loop: Enhancing LLM-Generated Code Beyond Correctness

Large language models (LLMs) have demonstrated impressive capabilities in code generation, achieving high scores on benchmarks such as HumanEval and MBPP. However, these benchmarks primarily assess functional correctness and neglect broader…

Software Engineering · Computer Science 2025-08-21 Scott Blyth , Sherlock A. Licorish , Christoph Treude , Markus Wagner

From Restructuring to Stabilization: A Large-Scale Experiment on Iterative Code Readability Refactoring with Large Language Models

Large language models (LLMs) are increasingly used for automated code refactoring tasks. Although these models can quickly refactor code, the quality may exhibit inconsistencies and unpredictable behavior. In this article, we systematically…

Software Engineering · Computer Science 2026-02-26 Norman Peitek , Julia Hess , Sven Apel

Large Language Models Versus Static Code Analysis Tools: A Systematic Benchmark for Vulnerability Detection

Modern software relies on a multitude of automated testing and quality assurance tools to prevent errors, bugs and potential vulnerabilities. This study sets out to provide a head-to-head, quantitative and qualitative evaluation of six…

Software Engineering · Computer Science 2025-08-07 Damian Gnieciak , Tomasz Szandala

WALL: A Web Application for Automated Quality Assurance using Large Language Models

As software projects become increasingly complex, the volume and variety of issues in code files have grown substantially. Addressing this challenge requires efficient issue detection, resolution, and evaluation tools. This paper presents…

Software Engineering · Computer Science 2025-09-15 Seyed Moein Abtahi , Akramul Azim

Sustainable Code Generation Using Large Language Models: A Systematic Literature Review

Large Language Models (LLMs) are widely used in software engineering to generate, complete, translate, and fix code, improving developer productivity. While most research focuses on the energy consumption and carbon emissions of model…

Software Engineering · Computer Science 2026-04-15 Sabiya Banu Masthan Ali , Oussema Kirmani , Aroosa Hameed , Syed Muhammad Danish , Gautam Srivastava

Augmenting Large Language Models with Static Code Analysis for Automated Code Quality Improvements

This study examined code issue detection and revision automation by integrating Large Language Models (LLMs) such as OpenAI's GPT-3.5 Turbo and GPT-4o into software development workflows. A static code analysis framework detects issues such…

Software Engineering · Computer Science 2025-06-13 Seyed Moein Abtahi , Akramul Azim

Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study

Despite various approaches being employed to detect vulnerabilities, the number of reported vulnerabilities shows an upward trend over the years. This suggests the problems are not caught before the code is released, which could be caused…

Cryptography and Security · Computer Science 2025-02-14 Karl Tamberg , Hayretdin Bahsi

Rethinking the Evaluation of Secure Code Generation

Large language models (LLMs) are widely used in software development. However, the code generated by LLMs often contains vulnerabilities. Several secure code generation methods have been proposed to address this issue, but their current…

Cryptography and Security · Computer Science 2025-11-14 Shih-Chieh Dai , Jun Xu , Guanhong Tao

Software Code Quality Measurement: Implications from Metric Distributions

Software code quality is a construct with three dimensions: maintainability, reliability, and functionality. Although many firms have incorporated code quality metrics in their operations, evaluating these metrics still lacks consistent…

Software Engineering · Computer Science 2024-01-17 Siyuan Jin , Mianmian Zhang , Yekai Guo , Yuejiang He , Ziyuan Li , Bichao Chen , Bing Zhu , Yong Xia

Efficacy of static analysis tools for software defect detection on open-source projects

In software practice, static analysis tools remain an integral part of detecting defects in software and there have been various tools designed to run the analysis in different programming languages like Java, C++, and Python. This paper…

Software Engineering · Computer Science 2024-05-22 Jones Yeboah , Saheed Popoola

Helping LLMs Improve Code Generation Using Feedback from Testing and Static Analysis

Large Language Models (LLMs) are one of the most promising developments in the field of artificial intelligence, and the software engineering community has readily noticed their potential role in the software development life-cycle.…

Software Engineering · Computer Science 2026-03-16 Greta Dolcetti , Vincenzo Arceri , Eleonora Iotti , Sergio Maffeis , Agostino Cortesi , Enea Zaffanella

Human-Aligned Code Readability Assessment with Large Language Models

Code readability is crucial for software comprehension and maintenance, yet difficult to assess at scale. Traditional static metrics often fail to capture the subjective, context-sensitive nature of human judgments. Large Language Models…

Software Engineering · Computer Science 2025-10-21 Wendkûuni C. Ouédraogo , Yinghua Li , Xueqi Dang , Pawel Borsukiewicz , Xin Zhou , Anil Koyuncu , Jacques Klein , David Lo , Tegawendé F. Bissyandé

Beyond Strict Rules: Assessing the Effectiveness of Large Language Models for Code Smell Detection

Code smells are symptoms of potential code quality problems that may affect software maintainability, thus increasing development costs and impacting software reliability. Large language models (LLMs) have shown remarkable capabilities for…

Software Engineering · Computer Science 2026-01-16 Saymon Souza , Amanda Santana , Eduardo Figueiredo , Igor Muzetti , João Eduardo Montandon , Lionel Briand

Exploring the Robustness of Large Language Models for Solving Programming Problems

Using large language models (LLMs) for source code has recently gained attention. LLMs, such as Transformer-based models like Codex and ChatGPT, have been shown to be highly capable of solving a wide range of programming problems. However,…

Computation and Language · Computer Science 2023-06-27 Atsushi Shirafuji , Yutaka Watanobe , Takumi Ito , Makoto Morishita , Yuki Nakamura , Yusuke Oda , Jun Suzuki