Related papers: Selecting and Combining Large Language Models for …

Towards Understanding the Capability of Large Language Models on Code Clone Detection: A Survey

Code cloning, the duplication of code fragments, is common in software development. While some reuse aids productivity, excessive cloning hurts maintainability and introduces bugs. Hence, automatic code clone detection is vital. Meanwhile,…

Software Engineering · Computer Science 2023-08-08 Shihan Dou , Junjie Shan , Haoxiang Jia , Wenhao Deng , Zhiheng Xi , Wei He , Yueming Wu , Tao Gui , Yang Liu , Xuanjing Huang

The Struggles of LLMs in Cross-lingual Code Clone Detection

With the involvement of multiple programming languages in modern software development, cross-lingual code clone detection has gained traction within the software engineering community. Numerous studies have explored this topic, proposing…

Software Engineering · Computer Science 2025-05-07 Micheline Bénédicte Moumoula , Abdoul Kader Kabore , Jacques Klein , Tegawendé Bissyande

An Empirical Study of LLM-Based Code Clone Detection

Large language models (LLMs) have demonstrated remarkable capabilities in various software engineering tasks, such as code generation and debugging, because of their ability to translate between programming languages and natural languages.…

Software Engineering · Computer Science 2025-11-04 Wenqing Zhu , Norihiro Yoshida , Eunjong Choi , Yutaka Matsubara , Hiroaki Takada

Assessing the Code Clone Detection Capability of Large Language Models

This study aims to assess the performance of two advanced Large Language Models (LLMs), GPT-3.5 and GPT-4, in the task of code clone detection. The evaluation involves testing the models on a variety of code pairs of different clone types…

Software Engineering · Computer Science 2024-07-03 Zixian Zhang , Takfarinas Saber

Evaluating Small-Scale Code Models for Code Clone Detection

Detecting code clones is relevant to software maintenance and code refactoring. This challenge still presents unresolved cases, mainly when structural similarity does not reflect functional equivalence, though recent code models show…

Software Engineering · Computer Science 2025-06-16 Jorge Martinez-Gil

An ensemble learning approach for software semantic clone detection

Code clone is a serious problem in software and has the potential to software defects, maintenance overhead, and licensing violations. Therefore, clone detection is important for reducing maintenance effort and improving code quality during…

Software Engineering · Computer Science 2020-10-12 Min Fu , Gang Luo , Xi Zheng , Tianyi Zhang , Dongjin Yu , Miryung Kim

Ensembling Large Language Models for Code Vulnerability Detection: An Empirical Evaluation

Code vulnerability detection is crucial for ensuring the security and reliability of modern software systems. Recently, Large Language Models (LLMs) have shown promising capabilities in this domain. However, notable discrepancies in…

Software Engineering · Computer Science 2025-09-19 Zhihong Sun , Jia Li , Yao Wan , Chuanyi Li , Hongyu Zhang , Zhi jin , Ge Li , Hong Liu , Chen Lyu , Songlin Hu

An Effective Approach to Embedding Source Code by Combining Large Language and Sentence Embedding Models

The advent of large language models (LLMs) has significantly advanced artificial intelligence (AI) in software engineering (SE), with source code embeddings playing a crucial role in tasks such as source code clone detection and source code…

Software Engineering · Computer Science 2025-06-04 Zixiang Xian , Chenhui Cui , Rubing Huang , Chunrong Fang , Zhenyu Chen

Using Ensemble Inference to Improve Recall of Clone Detection

Large-scale source-code clone detection is a challenging task. In our previous work, we proposed an approach (SSCD) that leverages artificial neural networks and approximates nearest neighbour search to effectively and efficiently locate…

Software Engineering · Computer Science 2024-02-13 Gul Aftab Ahmed , James Vincent Patten , Yuanhua Han , Guoxian Lu , David Gregg , Jim Buckley , Muslim Chochlov

Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities

While automated vulnerability detection techniques have made promising progress in detecting security vulnerabilities, their scalability and applicability remain challenging. The remarkable performance of Large Language Models (LLMs), such…

Cryptography and Security · Computer Science 2024-10-24 Avishree Khare , Saikat Dutta , Ziyang Li , Alaia Solko-Breslin , Rajeev Alur , Mayur Naik

Improving the Ability of Pre-trained Language Model by Imparting Large Language Model's Experience

Large Language Models (LLMs) and pre-trained Language Models (LMs) have achieved impressive success on many software engineering tasks (e.g., code completion and code generation). By leveraging huge existing code corpora (e.g., GitHub),…

Software Engineering · Computer Science 2025-01-16 Xin Yin , Chao Ni , Xiaodan Xu , Xinrui Li , Xiaohu Yang

Beyond Strict Rules: Assessing the Effectiveness of Large Language Models for Code Smell Detection

Code smells are symptoms of potential code quality problems that may affect software maintainability, thus increasing development costs and impacting software reliability. Large language models (LLMs) have shown remarkable capabilities for…

Software Engineering · Computer Science 2026-01-16 Saymon Souza , Amanda Santana , Eduardo Figueiredo , Igor Muzetti , João Eduardo Montandon , Lionel Briand

From Bias To Improved Prompts: A Case Study of Bias Mitigation of Clone Detection Models

The issue of clone code has persisted in software engineering, primarily because developers often copy and paste code segments. This common practice has elevated the importance of clone code detection, garnering attention from both software…

Software Engineering · Computer Science 2025-05-12 QiHong Chen , Lianghao Jiang , Iftekhar Ahmed

Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study

Despite various approaches being employed to detect vulnerabilities, the number of reported vulnerabilities shows an upward trend over the years. This suggests the problems are not caught before the code is released, which could be caused…

Cryptography and Security · Computer Science 2025-02-14 Karl Tamberg , Hayretdin Bahsi

Investigating the Efficacy of Large Language Models for Code Clone Detection

Large Language Models (LLMs) have demonstrated remarkable success in various natural language processing and software engineering tasks, such as code generation. The LLMs are mainly utilized in the prompt-based zero/few-shot paradigm to…

Software Engineering · Computer Science 2024-01-31 Mohamad Khajezade , Jie JW Wu , Fatemeh Hendijani Fard , Gema Rodríguez-Pérez , Mohamed Sami Shehata

Exploring the Potential of Large Language Models in Fine-Grained Review Comment Classification

Code review is a crucial practice in software development. As code review nowadays is lightweight, various issues can be identified, and sometimes, they can be trivial. Research has investigated automated approaches to classify review…

Software Engineering · Computer Science 2025-08-14 Linh Nguyen , Chunhua Liu , Hong Yi Lin , Patanamon Thongtanunam

Error Understanding in Program Code With LLM-DL for Multi-label Classification

Programming is a core skill in computer science and software engineering (SE), yet identifying and resolving code errors remains challenging for both novice and experienced developers. While Large Language Models (LLMs) have shown…

Software Engineering · Computer Science 2026-03-27 Md Faizul Ibne Amin , Yutaka Watanobe , Md. Mostafizer Rahman , Daniel M. Muepu , Md. Shahajada Mia

A Preliminary Study of Large Language Models for Multilingual Vulnerability Detection

Deep learning-based approaches, particularly those leveraging pre-trained language models (PLMs), have shown promise in automated software vulnerability detection. However, existing methods are predominantly limited to specific programming…

Software Engineering · Computer Science 2025-05-13 Junji Yu , Honglin Shu , Michael Fu , Dong Wang , Chakkrit Tantithamthavorn , Yasutaka Kamei , Junjie Chen

Code Vulnerability Detection: A Comparative Analysis of Emerging Large Language Models

The growing trend of vulnerability issues in software development as a result of a large dependence on open-source projects has received considerable attention recently. This paper investigates the effectiveness of Large Language Models…

Software Engineering · Computer Science 2024-09-17 Shaznin Sultana , Sadia Afreen , Nasir U. Eisty

Cross-Language Source Code Clone Detection Using Deep Learning with InferCode

Software clones are beneficial to detect security gaps and software maintenance in one programming language or across multiple languages. The existing work on source clone detection performs well but in a single programming language.…

Software Engineering · Computer Science 2022-05-11 Mohammad A. Yahya , Dae-Kyoo Kim