Related papers: How Far Have We Gone in Binary Code Understanding …

An Empirical Study on the Effectiveness of Large Language Models for Binary Code Understanding

Binary code analysis plays a pivotal role in the field of software security and is widely used in tasks such as software maintenance, malware detection, software vulnerability discovery, patch analysis, etc. However, unlike source code,…

Software Engineering · Computer Science 2025-05-01 Xiuwei Shang , Zhenkan Fu , Shaoyin Cheng , Guoqiang Chen , Gangyang Li , Li Hu , Weiming Zhang , Nenghai Yu

Exploring the Efficacy of Large Language Models (GPT-4) in Binary Reverse Engineering

This study investigates the capabilities of Large Language Models (LLMs), specifically GPT-4, in the context of Binary Reverse Engineering (RE). Employing a structured experimental approach, we analyzed the LLM's performance in interpreting…

Software Engineering · Computer Science 2024-06-12 Saman Pordanesh , Benjamin Tan

BinMetric: A Comprehensive Binary Analysis Benchmark for Large Language Models

Binary analysis remains pivotal in software security, offering insights into compiled programs without source code access. As large language models (LLMs) continue to excel in diverse language understanding and generation tasks, their…

Software Engineering · Computer Science 2025-05-13 Xiuwei Shang , Guoqiang Chen , Shaoyin Cheng , Benlong Wu , Li Hu , Gangyang Li , Weiming Zhang , Nenghai Yu

Can LLMs Deobfuscate Binary Code? A Systematic Analysis of Large Language Models into Pseudocode Deobfuscation

Deobfuscating binary code remains a fundamental challenge in reverse engineering, as obfuscation is widely used to hinder analysis and conceal program logic. Although large language models (LLMs) have shown promise in recovering semantics…

Software Engineering · Computer Science 2026-04-10 Li Hu , Xiuwei Shang , Jieke Shi , Shaoyin Cheng , Junqi Zhang , Gangyang Li , Zhou Yang , Weiming Zhang , David Lo

ReCopilot: Reverse Engineering Copilot in Binary Analysis

Binary analysis plays a pivotal role in security domains such as malware detection and vulnerability discovery, yet it remains labor-intensive and heavily reliant on expert knowledge. General-purpose large language models (LLMs) perform…

Cryptography and Security · Computer Science 2025-05-23 Guoqiang Chen , Huiqi Sun , Daguang Liu , Zhiqi Wang , Qiang Wang , Bin Yin , Lu Liu , Lingyun Ying

Leveraging Artificial Intelligence on Binary Code Comprehension

Understanding binary code is an essential but complex software engineering task for reverse engineering, malware analysis, and compiler optimization. Unlike source code, binary code has limited semantic information, which makes it…

Software Engineering · Computer Science 2022-10-12 Yifan Zhang

Large Language Model (LLM) for Software Security: Code Analysis, Malware Analysis, Reverse Engineering

Large Language Models (LLMs) have recently emerged as powerful tools in cybersecurity, offering advanced capabilities in malware detection, generation, and real-time monitoring. Numerous studies have explored their application in…

Cryptography and Security · Computer Science 2025-04-11 Hamed Jelodar , Samita Bai , Parisa Hamedi , Hesamodin Mohammadian , Roozbeh Razavi-Far , Ali Ghorbani

A Contemporary Survey of Large Language Model Assisted Program Analysis

The increasing complexity of software systems has driven significant advancements in program analysis, as traditional methods unable to meet the demands of modern software development. To address these limitations, deep learning techniques,…

Software Engineering · Computer Science 2025-02-27 Jiayimei Wang , Tao Ni , Wei-Bin Lee , Qingchuan Zhao

Large Language Models for Code Analysis: Do LLMs Really Do Their Job?

Large language models (LLMs) have demonstrated significant potential in the realm of natural language understanding and programming code processing tasks. Their capacity to comprehend and generate human-like code has spurred research into…

Software Engineering · Computer Science 2024-03-07 Chongzhou Fang , Ning Miao , Shaurya Srivastav , Jialin Liu , Ruoyu Zhang , Ruijie Fang , Asmita , Ryan Tsang , Najmeh Nazari , Han Wang , Houman Homayoun

Enhancing Reverse Engineering: Investigating and Benchmarking Large Language Models for Vulnerability Analysis in Decompiled Binaries

Security experts reverse engineer (decompile) binary code to identify critical security vulnerabilities. The limited access to source code in vital systems - such as firmware, drivers, and proprietary software used in Critical…

Cryptography and Security · Computer Science 2024-11-08 Dylan Manuel , Nafis Tanveer Islam , Joseph Khoury , Ana Nunez , Elias Bou-Harb , Peyman Najafirad

REBENCH: A Procedural, Fair-by-Construction Benchmark for LLMs on Stripped-Binary Types and Names (Extended Version)

Large Language Models (LLMs) have achieved remarkable progress in recent years, driving their adoption across a wide range of domains, including computer security. In reverse engineering, LLMs are increasingly applied to critical tasks such…

Cryptography and Security · Computer Science 2026-05-01 Jun Yeon Won , Xin Jin , Shiqing Ma , Zhiqiang Lin

Empirical Study of Code Large Language Models for Binary Security Patch Detection

Security patch detection (SPD) is crucial for maintaining software security, as unpatched vulnerabilities can lead to severe security risks. In recent years, numerous learning-based SPD approaches have demonstrated promising results on…

Software Engineering · Computer Science 2025-09-09 Qingyuan Li , Binchang Li , Cuiyun Gao , Shuzheng Gao , Zongjie Li

The Code Barrier: What LLMs Actually Understand?

Understanding code represents a core ability needed for automating software development tasks. While foundation models like LLMs show impressive results across many software engineering challenges, the extent of their true semantic…

Software Engineering · Computer Science 2025-04-16 Serge Lionel Nikiema , Jordan Samhi , Abdoul Kader Kaboré , Jacques Klein , Tegawendé F. Bissyandé

Binary Diff Summarization using Large Language Models

Security of software supply chains is necessary to ensure that software updates do not contain maliciously injected code or introduce vulnerabilities that may compromise the integrity of critical infrastructure. Verifying the integrity of…

Cryptography and Security · Computer Science 2025-09-30 Meet Udeshi , Venkata Sai Charan Putrevu , Prashanth Krishnamurthy , Prashant Anantharaman , Sean Carrick , Ramesh Karri , Farshad Khorrami

Towards Understanding the Capability of Large Language Models on Code Clone Detection: A Survey

Code cloning, the duplication of code fragments, is common in software development. While some reuse aids productivity, excessive cloning hurts maintainability and introduces bugs. Hence, automatic code clone detection is vital. Meanwhile,…

Software Engineering · Computer Science 2023-08-08 Shihan Dou , Junjie Shan , Haoxiang Jia , Wenhao Deng , Zhiheng Xi , Wei He , Yueming Wu , Tao Gui , Yang Liu , Xuanjing Huang

Large Language Models (LLMs) for Source Code Analysis: applications, models and datasets

Large language models (LLMs) and transformer-based architectures are increasingly utilized for source code analysis. As software systems grow in complexity, integrating LLMs into code analysis workflows becomes essential for enhancing…

Software Engineering · Computer Science 2025-03-25 Hamed Jelodar , Mohammad Meymani , Roozbeh Razavi-Far

Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study

Despite various approaches being employed to detect vulnerabilities, the number of reported vulnerabilities shows an upward trend over the years. This suggests the problems are not caught before the code is released, which could be caused…

Cryptography and Security · Computer Science 2025-02-14 Karl Tamberg , Hayretdin Bahsi

SoK: Potentials and Challenges of Large Language Models for Reverse Engineering

Reverse Engineering (RE) is central to software security, enabling tasks such as vulnerability discovery and malware analysis, but it remains labor-intensive and requires substantial expertise. Earlier advances in deep learning start to…

Cryptography and Security · Computer Science 2025-09-29 Xinyu Hu , Zhiwei Fu , Shaocong Xie , Steven H. H. Ding , Philippe Charland

AI-Guided Exploration of Large-Scale Codebases

Understanding large-scale, complex software systems is a major challenge for developers, who spend a significant portion of their time on program comprehension. Traditional tools such as static visualizations and reverse engineering…

Software Engineering · Computer Science 2025-08-11 Yoseph Berhanu Alebachew

Binary Code Summarization: Benchmarking ChatGPT/GPT-4 and Other Large Language Models

Binary code summarization, while invaluable for understanding code semantics, is challenging due to its labor-intensive nature. This study delves into the potential of large language models (LLMs) for binary code comprehension. To this end,…

Cryptography and Security · Computer Science 2023-12-18 Xin Jin , Jonathan Larson , Weiwei Yang , Zhiqiang Lin