Related papers: CREBench: Evaluating Large Language Models in Cryp…

REBENCH: A Procedural, Fair-by-Construction Benchmark for LLMs on Stripped-Binary Types and Names (Extended Version)

Large Language Models (LLMs) have achieved remarkable progress in recent years, driving their adoption across a wide range of domains, including computer security. In reverse engineering, LLMs are increasingly applied to critical tasks such…

Cryptography and Security · Computer Science 2026-05-01 Jun Yeon Won , Xin Jin , Shiqing Ma , Zhiqiang Lin

SoK: Potentials and Challenges of Large Language Models for Reverse Engineering

Reverse Engineering (RE) is central to software security, enabling tasks such as vulnerability discovery and malware analysis, but it remains labor-intensive and requires substantial expertise. Earlier advances in deep learning start to…

Cryptography and Security · Computer Science 2025-09-29 Xinyu Hu , Zhiwei Fu , Shaocong Xie , Steven H. H. Ding , Philippe Charland

Exploring the Efficacy of Large Language Models (GPT-4) in Binary Reverse Engineering

This study investigates the capabilities of Large Language Models (LLMs), specifically GPT-4, in the context of Binary Reverse Engineering (RE). Employing a structured experimental approach, we analyzed the LLM's performance in interpreting…

Software Engineering · Computer Science 2024-06-12 Saman Pordanesh , Benjamin Tan

REx86: A Local Large Language Model for Assisting in x86 Assembly Reverse Engineering

Reverse engineering (RE) of x86 binaries is indispensable for malware and firmware analysis, but remains slow due to stripped metadata and adversarial obfuscation. Large Language Models (LLMs) offer potential for improving RE efficiency…

Cryptography and Security · Computer Science 2026-03-09 Darrin Lea , James Ghawaly , Golden Richard , Aisha Ali-Gombe , Andrew Case

CrackMeBench: Binary Reverse Engineering for Agents

Benchmarks for coding agents increasingly measure source-level software repair, and cybersecurity benchmarks increasingly measure broad capture-the-flag performance. Classical binary reverse engineering remains less precisely specified:…

Software Engineering · Computer Science 2026-05-12 Isaac David , Arthur Gervais

BinMetric: A Comprehensive Binary Analysis Benchmark for Large Language Models

Binary analysis remains pivotal in software security, offering insights into compiled programs without source code access. As large language models (LLMs) continue to excel in diverse language understanding and generation tasks, their…

Software Engineering · Computer Science 2025-05-13 Xiuwei Shang , Guoqiang Chen , Shaoyin Cheng , Benlong Wu , Li Hu , Gangyang Li , Weiming Zhang , Nenghai Yu

How Far Have We Gone in Binary Code Understanding Using Large Language Models

Binary code analysis plays a pivotal role in various software security applications, such as software maintenance, malware detection, software vulnerability discovery, patch analysis, etc. However, unlike source code, understanding binary…

Software Engineering · Computer Science 2024-10-25 Xiuwei Shang , Shaoyin Cheng , Guoqiang Chen , Yanming Zhang , Li Hu , Xiao Yu , Gangyang Li , Weiming Zhang , Nenghai Yu

An Empirical Study on the Effectiveness of Large Language Models for Binary Code Understanding

Binary code analysis plays a pivotal role in the field of software security and is widely used in tasks such as software maintenance, malware detection, software vulnerability discovery, patch analysis, etc. However, unlike source code,…

Software Engineering · Computer Science 2025-05-01 Xiuwei Shang , Zhenkan Fu , Shaoyin Cheng , Guoqiang Chen , Gangyang Li , Li Hu , Weiming Zhang , Nenghai Yu

ReCopilot: Reverse Engineering Copilot in Binary Analysis

Binary analysis plays a pivotal role in security domains such as malware detection and vulnerability discovery, yet it remains labor-intensive and heavily reliant on expert knowledge. General-purpose large language models (LLMs) perform…

Cryptography and Security · Computer Science 2025-05-23 Guoqiang Chen , Huiqi Sun , Daguang Liu , Zhiqi Wang , Qiang Wang , Bin Yin , Lu Liu , Lingyun Ying

Enhancing Reverse Engineering: Investigating and Benchmarking Large Language Models for Vulnerability Analysis in Decompiled Binaries

Security experts reverse engineer (decompile) binary code to identify critical security vulnerabilities. The limited access to source code in vital systems - such as firmware, drivers, and proprietary software used in Critical…

Cryptography and Security · Computer Science 2024-11-08 Dylan Manuel , Nafis Tanveer Islam , Joseph Khoury , Ana Nunez , Elias Bou-Harb , Peyman Najafirad

Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models

Language Model (LM) agents for cybersecurity that are capable of autonomously identifying vulnerabilities and executing exploits have potential to cause real-world impact. Policymakers, model providers, and researchers in the AI and…

Cryptography and Security · Computer Science 2025-04-15 Andy K. Zhang , Neil Perry , Riya Dulepet , Joey Ji , Celeste Menders , Justin W. Lin , Eliot Jones , Gashon Hussein , Samantha Liu , Donovan Jasper , Pura Peetathawatchai , Ari Glenn , Vikram Sivashankar , Daniel Zamoshchin , Leo Glikbarg , Derek Askaryar , Mike Yang , Teddy Zhang , Rishi Alluri , Nathan Tran , Rinnara Sangpisit , Polycarpos Yiorkadjis , Kenny Osele , Gautham Raghupathi , Dan Boneh , Daniel E. Ho , Percy Liang

CritBench: A Framework for Evaluating Cybersecurity Capabilities of Large Language Models in IEC 61850 Digital Substation Environments

The advancement of Large Language Models (LLMs) has raised concerns regarding their dual-use potential in cybersecurity. Existing evaluation frameworks overwhelmingly focus on Information Technology (IT) environments, failing to capture the…

Cryptography and Security · Computer Science 2026-04-08 Gustav Keppler , Moritz Gstür , Veit Hagenmeyer

CryptoScope: Utilizing Large Language Models for Automated Cryptographic Logic Vulnerability Detection

Cryptographic algorithms are fundamental to modern security, yet their implementations frequently harbor subtle logic flaws that are hard to detect. We introduce CryptoScope, a novel framework for automated cryptographic vulnerability…

Cryptography and Security · Computer Science 2025-08-18 Zhihao Li , Zimo Ji , Tao Zheng , Hao Ren , Xiao Lan

Can LLMs Deobfuscate Binary Code? A Systematic Analysis of Large Language Models into Pseudocode Deobfuscation

Deobfuscating binary code remains a fundamental challenge in reverse engineering, as obfuscation is widely used to hinder analysis and conceal program logic. Although large language models (LLMs) have shown promise in recovering semantics…

Software Engineering · Computer Science 2026-04-10 Li Hu , Xiuwei Shang , Jieke Shi , Shaoyin Cheng , Junqi Zhang , Gangyang Li , Zhou Yang , Weiming Zhang , David Lo

CFG2VEC: Hierarchical Graph Neural Network for Cross-Architectural Software Reverse Engineering

Mission-critical embedded software is critical to our society's infrastructure but can be subject to new security vulnerabilities as technology advances. When security issues arise, Reverse Engineers (REs) use Software Reverse Engineering…

Software Engineering · Computer Science 2023-01-10 Shih-Yuan Yu , Yonatan Gizachew Achamyeleh , Chonghan Wang , Anton Kocheturov , Patrick Eisen , Mohammad Abdullah Al Faruque

AICrypto: Evaluating Cryptography Capabilities of Large Language Models

We build \textbf{AICrypto}, a comprehensive benchmark designed to evaluate the cryptography capabilities of large language models (LLMs). The benchmark comprises 135 multiple-choice questions, 150 capture-the-flag challenges, and 30 proof…

Cryptography and Security · Computer Science 2026-05-28 Yu Wang , Yijian Liu , Liheng Ji , Han Luo , Wenjie Li , Xiaofei Zhou , Chiyun Feng , Puji Wang , Yuhan Cao , Geyuan Zhang , Xiaojian Li , Rongwu Xu , Yilei Chen , Tianxing He

HardSecBench: Benchmarking the Security Awareness of LLMs for Hardware Code Generation

Large language models (LLMs) are being increasingly integrated into practical hardware and firmware development pipelines for code generation. Existing studies have primarily focused on evaluating the functional correctness of LLM-generated…

Cryptography and Security · Computer Science 2026-01-21 Qirui Chen , Jingxian Shuai , Shuangwu Chen , Shenghao Ye , Zijian Wen , Xufei Su , Jie Jin , Jiangming Li , Jun Chen , Xiaobin Tan , Jian Yang

Narrowing the Complexity Gap in the Evaluation of Large Language Models

Evaluating Large Language Models (LLMs) with respect to real-world code complexity is essential. Otherwise, there is a risk of overestimating LLMs' programming abilities based on simplistic benchmarks, only to be disappointed when using…

Software Engineering · Computer Science 2026-02-24 Yang Chen , Shuyang Liu , Reyhaneh Jabbarvand

RepoDebug: Repository-Level Multi-Task and Multi-Language Debugging Evaluation of Large Language Models

Large Language Models (LLMs) have exhibited significant proficiency in code debugging, especially in automatic program repair, which may substantially reduce the time consumption of developers and enhance their efficiency. Significant…

Software Engineering · Computer Science 2025-09-09 Jingjing Liu , Zeming Liu , Zihao Cheng , Mengliang He , Xiaoming Shi , Yuhang Guo , Xiangrong Zhu , Yuanfang Guo , Yunhong Wang , Haifeng Wang

RealSec-bench: A Benchmark for Evaluating Secure Code Generation in Real-World Repositories

Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation, but their proficiency in producing secure code remains a critical, under-explored area. Existing benchmarks often fall short by relying on synthetic…

Cryptography and Security · Computer Science 2026-02-02 Yanlin Wang , Ziyao Zhang , Chong Wang , Xinyi Xu , Mingwei Liu , Yong Wang , Jiachi Chen , Zibin Zheng