Related papers: Detection of LLM-Generated Java Code Using Discret…

The Hidden DNA of LLM-Generated JavaScript: Structural Patterns Enable High-Accuracy Authorship Attribution

In this paper, we present the first large-scale study exploring whether JavaScript code generated by Large Language Models (LLMs) can reveal which model produced it, enabling reliable authorship attribution and model fingerprinting. With…

Cryptography and Security · Computer Science 2025-12-02 Norbert Tihanyi , Bilel Cherif , Richard A. Dubniczky , Mohamed Amine Ferrag , Tamás Bisztray

Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities

While automated vulnerability detection techniques have made promising progress in detecting security vulnerabilities, their scalability and applicability remain challenging. The remarkable performance of Large Language Models (LLMs), such…

Cryptography and Security · Computer Science 2024-10-24 Avishree Khare , Saikat Dutta , Ziyang Li , Alaia Solko-Breslin , Rajeev Alur , Mayur Naik

Program Slicing in the Era of Large Language Models

Program slicing is a critical technique in software engineering, enabling developers to isolate relevant portions of code for tasks such as bug detection, code comprehension, and debugging. In this study, we investigate the application of…

Software Engineering · Computer Science 2024-09-20 Kimya Khakzad Shahandashti , Mohammad Mahdi Mohajer , Alvine Boaye Belle , Song Wang , Hadi Hemmati

Understanding Defects in Generated Codes by Language Models

This study investigates the reliability of code generation by Large Language Models (LLMs), focusing on identifying and analyzing defects in the generated code. Despite the advanced capabilities of LLMs in automating code generation,…

Software Engineering · Computer Science 2024-08-27 Ali Mohammadi Esfahani , Nafiseh Kahani , Samuel A. Ajila

The GitHub Recent Bugs Dataset for Evaluating LLM-based Debugging Applications

Large Language Models (LLMs) have demonstrated strong natural language processing and code synthesis capabilities, which has led to their rapid adoption in software engineering applications. However, details about LLM training data are…

Software Engineering · Computer Science 2023-11-03 Jae Yong Lee , Sungmin Kang , Juyeon Yoon , Shin Yoo

Large Language Model for Vulnerability Detection: Emerging Results and Future Directions

Previous learning-based vulnerability detection methods relied on either medium-sized pre-trained models or smaller neural networks from scratch. Recent advancements in Large Pre-Trained Language Models (LLMs) have showcased remarkable…

Software Engineering · Computer Science 2024-01-30 Xin Zhou , Ting Zhang , David Lo

Assessing the Code Clone Detection Capability of Large Language Models

This study aims to assess the performance of two advanced Large Language Models (LLMs), GPT-3.5 and GPT-4, in the task of code clone detection. The evaluation involves testing the models on a variety of code pairs of different clone types…

Software Engineering · Computer Science 2024-07-03 Zixian Zhang , Takfarinas Saber

Automatic Detection of LLM-Generated Code: A Comparative Case Study of Contemporary Models Across Function and Class Granularities

The adoption of Large Language Models (LLMs) for code generation risks incorporating vulnerable code into software systems. Existing detectors face two critical limitations: a lack of systematic cross-model validation and opaque "black box"…

Software Engineering · Computer Science 2025-12-23 Musfiqur Rahman , SayedHassan Khatoonabadi , Ahmad Abdellatif , Emad Shihab

Secure Coding with AI -- From Detection to Repair

While several studies have examined the security of code generated by GPT and other Large Language Models (LLMs), most have relied on controlled experiments rather than real developer interactions. This paper investigates the security of…

Software Engineering · Computer Science 2026-02-19 Vladislav Belozerov , Peter J Barclay , Ashkan Sami

Measuring Determinism in Large Language Models for Software Code Review

Large Language Models (LLMs) promise to streamline software code reviews, but their ability to produce consistent assessments remains an open question. In this study, we tested four leading LLMs -- GPT-4o mini, GPT-4o, Claude 3.5 Sonnet,…

Software Engineering · Computer Science 2025-03-03 Eugene Klishevich , Yegor Denisov-Blanch , Simon Obstbaum , Igor Ciobanu , Michal Kosinski

I Can Find You in Seconds! Leveraging Large Language Models for Code Authorship Attribution

Source code authorship attribution is important in software forensics, plagiarism detection, and protecting software patch integrity. Existing techniques often rely on supervised machine learning, which struggles with generalization across…

Software Engineering · Computer Science 2025-01-15 Soohyeon Choi , Yong Kiam Tan , Mark Huasong Meng , Mohamed Ragab , Soumik Mondal , David Mohaisen , Khin Mi Mi Aung

Bugs in Large Language Models Generated Code: An Empirical Study

Large Language Models (LLMs) for code have gained significant attention recently. They can generate code in different programming languages based on provided prompts, fulfilling a long-lasting dream in Software Engineering (SE), i.e.,…

Software Engineering · Computer Science 2024-03-19 Florian Tambon , Arghavan Moradi Dakhel , Amin Nikanjam , Foutse Khomh , Michel C. Desmarais , Giuliano Antoniol

Investigating the Efficacy of Large Language Models for Code Clone Detection

Large Language Models (LLMs) have demonstrated remarkable success in various natural language processing and software engineering tasks, such as code generation. The LLMs are mainly utilized in the prompt-based zero/few-shot paradigm to…

Software Engineering · Computer Science 2024-01-31 Mohamad Khajezade , Jie JW Wu , Fatemeh Hendijani Fard , Gema Rodríguez-Pérez , Mohamed Sami Shehata

Secret Breach Detection in Source Code with Large Language Models

Background: Leaking sensitive information - such as API keys, tokens, and credentials - in source code remains a persistent security threat. Traditional regex and entropy-based tools often generate high false positives due to limited…

Software Engineering · Computer Science 2025-07-29 Md Nafiu Rahman , Sadif Ahmed , Zahin Wahab , S M Sohan , Rifat Shahriyar

Ensemble Learning for Large Language Models in Text and Code Generation: A Survey

Generative Pretrained Transformers (GPTs) are foundational Large Language Models (LLMs) for text generation. However, individual LLMs often produce inconsistent outputs and exhibit biases, limiting their representation of diverse language…

Computation and Language · Computer Science 2025-08-06 Mari Ashiga , Wei Jie , Fan Wu , Vardan Voskanyan , Fateme Dinmohammadi , Paul Brookes , Jingzhi Gong , Zheng Wang

Automatic Generation of a Cryptography Misuse Taxonomy Using Large Language Models

The prevalence of cryptographic API misuse (CAM) is compromising the effectiveness of cryptography and in turn the security of modern systems and applications. Despite extensive efforts to develop CAM detection tools, these tools typically…

Cryptography and Security · Computer Science 2025-09-16 Yang Zhang , Wenyi Ouyang , Yi Zhang , Liang Cheng , Chen Wu , Wenxin Hu

CodeVision: Detecting LLM-Generated Code Using 2D Token Probability Maps and Vision Models

The rise of large language models (LLMs) like ChatGPT has significantly improved automated code generation, enhancing software development efficiency. However, this introduces challenges in academia, particularly in distinguishing between…

Software Engineering · Computer Science 2025-01-08 Zhenyu Xu , Victor S. Sheng

Can Large Language Models Find And Fix Vulnerable Software?

In this study, we evaluated the capability of Large Language Models (LLMs), particularly OpenAI's GPT-4, in detecting software vulnerabilities, comparing their performance against traditional static code analyzers like Snyk and Fortify. Our…

Software Engineering · Computer Science 2023-08-22 David Noever

Leveraging Explainable AI for LLM Text Attribution: Differentiating Human-Written and Multiple LLMs-Generated Text

The development of Generative AI Large Language Models (LLMs) raised the alarm regarding identifying content produced through generative AI or humans. In one case, issues arise when students heavily rely on such tools in a manner that can…

Computation and Language · Computer Science 2025-01-07 Ayat Najjar , Huthaifa I. Ashqar , Omar Darwish , Eman Hammad

Code Vulnerability Detection: A Comparative Analysis of Emerging Large Language Models

The growing trend of vulnerability issues in software development as a result of a large dependence on open-source projects has received considerable attention recently. This paper investigates the effectiveness of Large Language Models…

Software Engineering · Computer Science 2024-09-17 Shaznin Sultana , Sadia Afreen , Nasir U. Eisty