Related papers: Code Compliance Assessment as a Learning Problem

Learning and Evaluating Contextual Embedding of Source Code

Recent research has achieved impressive results on understanding and improving source code by building up on machine-learning techniques developed for natural languages. A significant advancement in natural-language understanding has come…

Software Engineering · Computer Science 2020-08-19 Aditya Kanade , Petros Maniatis , Gogul Balakrishnan , Kensen Shi

Measuring design compliance using neural language models -- an automotive case study

As the modern vehicle becomes more software-defined, it is beginning to take significant effort to avoid serious regression in software design. This is because automotive software architects rely largely upon manual review of code to spot…

Software Engineering · Computer Science 2022-08-30 Dhasarathy Parthasarathy , Cecilia Ekelin , Anjali Karri , Jiapeng Sun , Panagiotis Moraitis

On the Limitations of Embedding Based Methods for Measuring Functional Correctness for Code Generation

The task of code generation from natural language (NL2Code) has become extremely popular, especially with the advent of Large Language Models (LLMs). However, efforts to quantify and track this progress have suffered due to a lack of…

Software Engineering · Computer Science 2024-05-06 Atharva Naik

Code Smell Detection via Pearson Correlation and ML Hyperparameter Optimization

This study addresses the challenge of detecting code smells in large-scale software systems using machine learning (ML). Traditional detection methods often suffer from low accuracy and poor generalization across different datasets. To…

Computational Engineering, Finance, and Science · Computer Science 2025-10-08 Moinuddin Muhammad Imtiaz Bhuiyan , Kazi Ekramul Hoque , Rakibul Islam , Md. Mahbubur Rahman Tusher , Najmul Hassan , Yoichi Tomioka , Satoshi Nishimura , Jungpil Shin , Abu Saleh Musa Miah

Exploring Code Analysis: Zero-Shot Insights on Syntax and Semantics with LLMs

Code analysis is fundamental in Software Engineering, supporting debugging, optimization, and security assessment. Human developers approach it through syntax parsing, static semantics inference, and dynamic reasoning. Traditional tools are…

Software Engineering · Computer Science 2026-05-22 Wei Ma , Zhihao Lin , Shangqing Liu , Qiang Hu , Ye Liu , Wenhan Wang , Cen Zhang , Liming Nie , Li Li , Yang Liu , Lingxiao Jiang

CIFE: Code Instruction-Following Evaluation

Large Language Models (LLMs) are increasingly applied to real-world code generation, where functional correctness alone is insufficient for reliable deployment, developers also expect adherence to explicit requirements for robustness,…

Software Engineering · Computer Science 2025-12-22 Sravani Gunnu , Shanmukha Guttula , Hima Patel

Optimizing Code Embeddings and ML Classifiers for Python Source Code Vulnerability Detection

In recent years, the growing complexity and scale of source code have rendered manual software vulnerability detection increasingly impractical. To address this challenge, automated approaches leveraging machine learning and code embeddings…

Software Engineering · Computer Science 2025-09-17 Talaya Farasat , Joachim Posegga

A Machine Learning Based Framework for Code Clone Validation

A code clone is a pair of code fragments, within or between software systems that are similar. Since code clones often negatively impact the maintainability of a software system, several code clone detection techniques and tools have been…

Software Engineering · Computer Science 2020-05-05 Golam Mostaeen , Banani Roy , Chanchal Roy , Kevin Schneider , Jeffrey Svajlenko

Active Code Learning: Benchmarking Sample-Efficient Training of Code Models

The costly human effort required to prepare the training data of machine learning (ML) models hinders their practical development and usage in software engineering (ML4Code), especially for those with limited budgets. Therefore, efficiently…

Software Engineering · Computer Science 2023-06-05 Qiang Hu , Yuejun Guo , Xiaofei Xie , Maxime Cordy , Lei Ma , Mike Papadakis , Yves Le Traon

CodeBERT-nt: code naturalness via CodeBERT

Much of software-engineering research relies on the naturalness of code, the fact that code, in small code snippets, is repetitive and can be predicted using statistical language models like n-gram. Although powerful, training such models…

Software Engineering · Computer Science 2022-08-15 Ahmed Khanfir , Matthieu Jimenez , Mike Papadakis , Yves Le Traon

An Empirical Validation of Cognitive Complexity as a Measure of Source Code Understandability

Background: Developers spend a lot of their time on understanding source code. Static code analysis tools can draw attention to code that is difficult for developers to understand. However, most of the findings are based on non-validated…

Software Engineering · Computer Science 2020-07-27 Marvin Muñoz Barón , Marvin Wyrich , Stefan Wagner

Selecting and Combining Large Language Models for Scalable Code Clone Detection

Source code clones pose risks ranging from intellectual property violations to unintended vulnerabilities. Effective and efficient scalable clone detection, especially for diverged clones, remains challenging. Large language models (LLMs)…

Software Engineering · Computer Science 2025-10-20 Muslim Chochlov , Gul Aftab Ahmed , James Vincent Patten , Yuanhua Han , Guoxian Lu , David Gregg , Jim Buckley

Evaluating and Achieving Controllable Code Completion in Code LLM

Code completion has become a central task, gaining significant attention with the rise of large language model (LLM)-based tools in software engineering. Although recent advances have greatly improved LLMs' code completion abilities,…

Software Engineering · Computer Science 2026-01-23 Jiajun Zhang , Zeyu Cui , Lei Zhang , Jian Yang , Jiaxi Yang , Qiang Liu , Zilei Wang , Binyuan Hui , Liang Wang , Junyang Lin

Measuring Coding Challenge Competence With APPS

While programming is one of the most broadly applicable skills in modern society, modern machine learning models still cannot code solutions to basic problems. Despite its importance, there has been surprisingly little work on evaluating…

Software Engineering · Computer Science 2021-11-10 Dan Hendrycks , Steven Basart , Saurav Kadavath , Mantas Mazeika , Akul Arora , Ethan Guo , Collin Burns , Samir Puranik , Horace He , Dawn Song , Jacob Steinhardt

The Struggles of LLMs in Cross-lingual Code Clone Detection

With the involvement of multiple programming languages in modern software development, cross-lingual code clone detection has gained traction within the software engineering community. Numerous studies have explored this topic, proposing…

Software Engineering · Computer Science 2025-05-07 Micheline Bénédicte Moumoula , Abdoul Kader Kabore , Jacques Klein , Tegawendé Bissyande

ICE-Score: Instructing Large Language Models to Evaluate Code

Recent advancements in the field of natural language generation have facilitated the use of large language models to assess the quality of generated text. Although these models have shown promising results in tasks such as machine…

Artificial Intelligence · Computer Science 2024-01-23 Terry Yue Zhuo

Predicting Code Coverage without Execution

Code coverage is a widely used metric for quantifying the extent to which program elements, such as statements or branches, are executed during testing. Calculating code coverage is resource-intensive, requiring code building and execution…

Software Engineering · Computer Science 2023-07-26 Michele Tufano , Shubham Chandel , Anisha Agarwal , Neel Sundaresan , Colin Clement

An Effective Approach to Embedding Source Code by Combining Large Language and Sentence Embedding Models

The advent of large language models (LLMs) has significantly advanced artificial intelligence (AI) in software engineering (SE), with source code embeddings playing a crucial role in tasks such as source code clone detection and source code…

Software Engineering · Computer Science 2025-06-04 Zixiang Xian , Chenhui Cui , Rubing Huang , Chunrong Fang , Zhenyu Chen

Code Vulnerability Detection Across Different Programming Languages with AI Models

Security vulnerabilities present in a code that has been written in diverse programming languages are among the most critical yet complicated aspects of source code to detect. Static analysis tools based on rule-based patterns usually do…

Cryptography and Security · Computer Science 2025-08-19 Hael Abdulhakim Ali Humran , Ferdi Sonmez

An Empirical Study of LLM-Based Code Clone Detection

Large language models (LLMs) have demonstrated remarkable capabilities in various software engineering tasks, such as code generation and debugging, because of their ability to translate between programming languages and natural languages.…

Software Engineering · Computer Science 2025-11-04 Wenqing Zhu , Norihiro Yoshida , Eunjong Choi , Yutaka Matsubara , Hiroaki Takada