Related papers: Source Code Foundation Models are Transferable Bin…

An Empirical Study on the Effectiveness of Large Language Models for Binary Code Understanding

Binary code analysis plays a pivotal role in the field of software security and is widely used in tasks such as software maintenance, malware detection, software vulnerability discovery, patch analysis, etc. However, unlike source code,…

Software Engineering · Computer Science 2025-05-01 Xiuwei Shang , Zhenkan Fu , Shaoyin Cheng , Guoqiang Chen , Gangyang Li , Li Hu , Weiming Zhang , Nenghai Yu

Cross-modal Retrieval Models for Stripped Binary Analysis

Retrieving binary code via natural language queries is a pivotal capability for downstream tasks in the software security domain, such as vulnerability detection and malware analysis. However, it is challenging to identify binary functions…

Software Engineering · Computer Science 2026-01-06 Guoqiang Chen , Lingyun Ying , Ziyang Song , Daguang Liu , Qiang Wang , Zhiqi Wang , Li Hu , Shaoyin Cheng , Weiming Zhang , Nenghai Yu

How Far Have We Gone in Binary Code Understanding Using Large Language Models

Binary code analysis plays a pivotal role in various software security applications, such as software maintenance, malware detection, software vulnerability discovery, patch analysis, etc. However, unlike source code, understanding binary…

Software Engineering · Computer Science 2024-10-25 Xiuwei Shang , Shaoyin Cheng , Guoqiang Chen , Yanming Zhang , Li Hu , Xiao Yu , Gangyang Li , Weiming Zhang , Nenghai Yu

Extending Source Code Pre-Trained Language Models to Summarise Decompiled Binaries

Reverse engineering binaries is required to understand and analyse programs for which the source code is unavailable. Decompilers can transform the largely unreadable binaries into a more readable source code-like representation. However,…

Cryptography and Security · Computer Science 2023-01-16 Ali Al-Kaswan , Toufique Ahmed , Maliheh Izadi , Anand Ashok Sawant , Premkumar Devanbu , Arie van Deursen

Beyond Embeddings: Interpretable Feature Extraction for Binary Code Similarity

Binary code similarity detection is a core task in reverse engineering. It supports malware analysis and vulnerability discovery by identifying semantically similar code in different contexts. Modern methods have progressed from manually…

Artificial Intelligence · Computer Science 2025-09-30 Charles E. Gagnon , Steven H. H. Ding , Philippe Charland , Benjamin C. M. Fung

Enhancing Reverse Engineering: Investigating and Benchmarking Large Language Models for Vulnerability Analysis in Decompiled Binaries

Security experts reverse engineer (decompile) binary code to identify critical security vulnerabilities. The limited access to source code in vital systems - such as firmware, drivers, and proprietary software used in Critical…

Cryptography and Security · Computer Science 2024-11-08 Dylan Manuel , Nafis Tanveer Islam , Joseph Khoury , Ana Nunez , Elias Bou-Harb , Peyman Najafirad

GraphBinMatch: Graph-based Similarity Learning for Cross-Language Binary and Source Code Matching

Matching binary to source code and vice versa has various applications in different fields, such as computer security, software engineering, and reverse engineering. Even though there exist methods that try to match source code with binary…

Software Engineering · Computer Science 2023-04-11 Ali TehraniJamsaz , Hanze Chen , Ali Jannesari

CoDe-R: Refining Decompiler Output with LLMs via Rationale Guidance and Adaptive Inference

Binary decompilation is a critical reverse engineering task aimed at reconstructing high-level source code from stripped executables. Although Large Language Models (LLMs) have recently shown promise, they often suffer from "logical…

Software Engineering · Computer Science 2026-04-15 Qiang Zhang , Zhongnian Li

Cross-Language Binary-Source Code Matching with Intermediate Representations

Binary-source code matching plays an important role in many security and software engineering related tasks such as malware detection, reverse engineering and vulnerability assessment. Currently, several approaches have been proposed for…

Software Engineering · Computer Science 2022-01-20 Yi Gui , Yao Wan , Hongyu Zhang , Huifang Huang , Yulei Sui , Guandong Xu , Zhiyuan Shao , Hai Jin

Exploring the Efficacy of Large Language Models (GPT-4) in Binary Reverse Engineering

This study investigates the capabilities of Large Language Models (LLMs), specifically GPT-4, in the context of Binary Reverse Engineering (RE). Employing a structured experimental approach, we analyzed the LLM's performance in interpreting…

Software Engineering · Computer Science 2024-06-12 Saman Pordanesh , Benjamin Tan

Learning Binary Autoencoder-Based Codes with Progressive Training

Error correcting codes play a central role in digital communication, ensuring that transmitted information can be accurately reconstructed despite channel impairments. Recently, autoencoder (AE) based approaches have gained attention for…

Information Theory · Computer Science 2025-11-13 Vukan Ninkovic , Dejan Vukobratovic

Improving type information inferred by decompilers with supervised machine learning

In software reverse engineering, decompilation is the process of recovering source code from binary files. Decompilers are used when it is necessary to understand or analyze software for which the source code is not available. Although…

Software Engineering · Computer Science 2021-02-25 Javier Escalada , Ted Scully , Francisco Ortin

Beyond C/C++: Probabilistic and LLM Methods for Next-Generation Software Reverse Engineering

This proposal discusses the growing challenges in reverse engineering modern software binaries, particularly those compiled from newer system programming languages such as Rust, Go, and Mojo. Traditional reverse engineering techniques,…

Software Engineering · Computer Science 2025-06-05 Zhuo Zhuo , Xiangyu Zhang

MoBiE: Efficient Inference of Mixture of Binary Experts under Post-Training Quantization

Mixture-of-Experts (MoE) based large language models (LLMs) offer strong performance but suffer from high memory and computation costs. Weight binarization provides extreme efficiency, yet existing binary methods designed for dense LLMs…

Machine Learning · Computer Science 2026-04-22 Zhixiong Zhao , Zukang Xu , Zhixuan Chen , Dawei Yang

An Effective Approach to Embedding Source Code by Combining Large Language and Sentence Embedding Models

The advent of large language models (LLMs) has significantly advanced artificial intelligence (AI) in software engineering (SE), with source code embeddings playing a crucial role in tasks such as source code clone detection and source code…

Software Engineering · Computer Science 2025-06-04 Zixiang Xian , Chenhui Cui , Rubing Huang , Chunrong Fang , Zhenyu Chen

ReCopilot: Reverse Engineering Copilot in Binary Analysis

Binary analysis plays a pivotal role in security domains such as malware detection and vulnerability discovery, yet it remains labor-intensive and heavily reliant on expert knowledge. General-purpose large language models (LLMs) perform…

Cryptography and Security · Computer Science 2025-05-23 Guoqiang Chen , Huiqi Sun , Daguang Liu , Zhiqi Wang , Qiang Wang , Bin Yin , Lu Liu , Lingyun Ying

BinPRE: Enhancing Field Inference in Binary Analysis Based Protocol Reverse Engineering

Protocol reverse engineering (PRE) aims to infer the specification of network protocols when the source code is not available. Specifically, field inference is one crucial step in PRE to infer the field formats and semantics. To perform…

Software Engineering · Computer Science 2024-09-04 Jiayi Jiang , Xiyuan Zhang , Chengcheng Wan , Haoyi Chen , Haiying Sun , Ting Su

HARE: HumAn pRiors, a key to small language model Efficiency

Human priors play a crucial role in efficiently utilizing data in deep learning. However, with the development of large language models (LLMs), there is an increasing emphasis on scaling both model size and data volume, which often…

Computation and Language · Computer Science 2024-06-19 Lingyun Zhang , Bin jin , Gaojian Ge , Lunhui Liu , Xuewen Shen , Mingyong Wu , Houqian Zhang , Yongneng Jiang , Shiqi Chen , Shi Pu

Leveraging Artificial Intelligence on Binary Code Comprehension

Understanding binary code is an essential but complex software engineering task for reverse engineering, malware analysis, and compiler optimization. Unlike source code, binary code has limited semantic information, which makes it…

Software Engineering · Computer Science 2022-10-12 Yifan Zhang

Learning to Find Usages of Library Functions in Optimized Binaries

Much software, whether beneficent or malevolent, is distributed only as binaries, sans source code. Absent source code, understanding binaries' behavior can be quite challenging, especially when compiled under higher levels of compiler…

Software Engineering · Computer Science 2021-09-20 Toufique Ahmed , Premkumar Devanbu , Anand Ashok Sawant