Related papers: PCodeTrans: Translate Decompiled Pseudocode to Com…

A Neural-based Program Decompiler

Reverse engineering of binary executables is a critical problem in the computer security domain. On the one hand, malicious parties may recover interpretable source codes from the software products to gain commercial advantages. On the…

Programming Languages · Computer Science 2019-07-01 Cheng Fu , Huili Chen , Haolan Liu , Xinyun Chen , Yuandong Tian , Farinaz Koushanfar , Jishen Zhao

CoDe-R: Refining Decompiler Output with LLMs via Rationale Guidance and Adaptive Inference

Binary decompilation is a critical reverse engineering task aimed at reconstructing high-level source code from stripped executables. Although Large Language Models (LLMs) have recently shown promise, they often suffer from "logical…

Software Engineering · Computer Science 2026-04-15 Qiang Zhang , Zhongnian Li

Constraint-Guided Multi-Agent Decompilation for Executable Binary Recovery

Decompilation -- recovering source code from compiled binaries -- is essential for security analysis, malware reverse engineering, and legacy software maintenance. However, existing decompilers produce code that often fails to compile or…

Software Engineering · Computer Science 2026-05-05 Yifan Zhang , Xiaohan Wang , Yueke Zhang , Yu Huang , Kevin Leach

LLM4Decompile: Decompiling Binary Code with Large Language Models

Decompilation aims to convert binary code to high-level source code, but traditional tools like Ghidra often produce results that are difficult to read and execute. Motivated by the advancements in Large Language Models (LLMs), we propose…

Programming Languages · Computer Science 2025-08-06 Hanzhuo Tan , Qi Luo , Jing Li , Yuqun Zhang

SALT4Decompile: Inferring Source-level Abstract Logic Tree for LLM-Based Binary Decompilation

Decompilation is widely used in reverse engineering to recover high-level language code from binary executables. While recent approaches leveraging Large Language Models (LLMs) have shown promising progress, they typically treat assembly…

Software Engineering · Computer Science 2025-09-19 Yongpan Wang , Xin Xu , Xiaojie Zhu , Xiaodong Gu , Beijun Shen

CoTran: An LLM-based Code Translator using Reinforcement Learning with Feedback from Compiler and Symbolic Execution

In this paper, we present an LLM-based code translation method and an associated tool called CoTran, that translates whole-programs from one high-level programming language to another. Existing LLM-based code translation methods lack…

Programming Languages · Computer Science 2024-10-31 Prithwish Jana , Piyush Jha , Haoyang Ju , Gautham Kishore , Aryan Mahajan , Vijay Ganesh

ReF Decompile: Relabeling and Function Call Enhanced Decompile

The goal of decompilation is to convert compiled low-level code (e.g., assembly code) back into high-level programming languages, enabling analysis in scenarios where source code is unavailable. This task supports various reverse…

Software Engineering · Computer Science 2025-02-19 Yunlong Feng , Bohan Li , Xiaoming Shi , Qingfu Zhu , Wanxiang Che

Semantics-Recovering Decompilation through Neural Machine Translation

Decompilation transforms low-level program languages (PL) (e.g., binary code) into high-level PLs (e.g., C/C++). It has been widely used when analysts perform security analysis on software (systems) whose source code is unavailable, such as…

Cryptography and Security · Computer Science 2022-01-03 Ruigang Liang , Ying Cao , Peiwei Hu , Jinwen He , Kai Chen

Decaf: Improving Neural Decompilation with Automatic Feedback and Search

Decompilers are useful tools used in reverse engineering to understand compiled source code. Reconstructing source code from compiled binaries is a challenging task, because high-level syntax, identifiers, and custom data types are…

Software Engineering · Computer Science 2026-05-13 Alexander Shypula , Osbert Bastani , Edward Schwartz

ReCopilot: Reverse Engineering Copilot in Binary Analysis

Binary analysis plays a pivotal role in security domains such as malware detection and vulnerability discovery, yet it remains labor-intensive and heavily reliant on expert knowledge. General-purpose large language models (LLMs) perform…

Cryptography and Security · Computer Science 2025-05-23 Guoqiang Chen , Huiqi Sun , Daguang Liu , Zhiqi Wang , Qiang Wang , Bin Yin , Lu Liu , Lingyun Ying

AlphaTrans: A Neuro-Symbolic Compositional Approach for Repository-Level Code Translation and Validation

Code translation transforms programs from one programming language (PL) to another. Several rule-based transpilers have been designed to automate code translation between different pairs of PLs. However, the rules can become obsolete as the…

Software Engineering · Computer Science 2025-06-23 Ali Reza Ibrahimzada , Kaiyao Ke , Mrigank Pawagi , Muhammad Salman Abid , Rangeet Pan , Saurabh Sinha , Reyhaneh Jabbarvand

CODEFUSE-DEBENCH: An Empirical Study on Readability, Recompilability, and Functionality

Binary decompilation aims to recover binaries into high-level source code, but existing evaluations mainly rely on syntactic similarity or single-axis readability metrics, which fail to capture practical reusability. We propose a…

Software Engineering · Computer Science 2026-05-29 Puzhuo Liu , Yuhan Huang , Jianlei Chi , Peng Di , Yu Jiang

Superset Decompilation

Reverse engineering tools remain monolithic and imperative compared to the advancement of modern compiler architectures: analyses are tied to a single mutable representation, making them difficult to extend or refine, and forcing premature…

Programming Languages · Computer Science 2026-03-31 Chang Liu , Yihao Sun , Thomas Gilray , Kristopher Micinski

Can Emulating Semantic Translation Help LLMs with Code Translation? A Study Based on Pseudocode

Although large language models (LLMs) show promising potential in code translation, they still struggle to generate accurate translations using the commonly adopted direct code-to-code translation approach, which converts an original…

Software Engineering · Computer Science 2026-02-24 Songqiang Chen , Congying Xu , Jingyi Chen , Jialun Cao , Jiarong Wu , Shing-Chi Cheung

LLM4CodeRE: Generative AI for Code Decompilation Analysis and Reverse Engineering

Code decompilation analysis is a fundamental yet challenging task in malware reverse engineering, particularly due to the pervasive use of sophisticated obfuscation techniques. Although recent large language models (LLMs) have shown promise…

Cryptography and Security · Computer Science 2026-04-08 Hamed Jelodar , Samita Bai , Tochukwu Emmanuel Nwankwo , Parisa Hamedi , Mohammad Meymani , Roozbeh Razavi-Far , Ali A. Ghorbani

CodeTrans: Towards Cracking the Language of Silicon's Code Through Self-Supervised Deep Learning and High Performance Computing

Currently, a growing number of mature natural language processing applications make people's life more convenient. Such applications are built by source code - the language in software engineering. However, the applications for…

Software Engineering · Computer Science 2021-05-13 Ahmed Elnaggar , Wei Ding , Llion Jones , Tom Gibbs , Tamas Feher , Christoph Angerer , Silvia Severini , Florian Matthes , Burkhard Rost

Extending Source Code Pre-Trained Language Models to Summarise Decompiled Binaries

Reverse engineering binaries is required to understand and analyse programs for which the source code is unavailable. Decompilers can transform the largely unreadable binaries into a more readable source code-like representation. However,…

Cryptography and Security · Computer Science 2023-01-16 Ali Al-Kaswan , Toufique Ahmed , Maliheh Izadi , Anand Ashok Sawant , Premkumar Devanbu , Arie van Deursen

DecompileBench: A Comprehensive Benchmark for Evaluating Decompilers in Real-World Scenarios

Decompilers are fundamental tools for critical security tasks, from vulnerability discovery to malware analysis, yet their evaluation remains fragmented. Existing approaches primarily focus on syntactic correctness through synthetic…

Software Engineering · Computer Science 2025-05-19 Zeyu Gao , Yuxin Cui , Hao Wang , Siliang Qin , Yuanda Wang , Bolun Zhang , Chao Zhang

Beyond Embeddings: Interpretable Feature Extraction for Binary Code Similarity

Binary code similarity detection is a core task in reverse engineering. It supports malware analysis and vulnerability discovery by identifying semantically similar code in different contexts. Modern methods have progressed from manually…

Artificial Intelligence · Computer Science 2025-09-30 Charles E. Gagnon , Steven H. H. Ding , Philippe Charland , Benjamin C. M. Fung

D-LiFT: Improving LLM-based Decompiler Backend via Code Quality-driven Fine-tuning

As one of the key tools in many security tasks, decompilers reconstruct human-readable source code from binaries. Yet, despite recent advances, their outputs often suffer from syntactic and semantic errors and remain difficult to read.…

Cryptography and Security · Computer Science 2025-08-19 Muqi Zou , Hongyu Cai , Hongwei Wu , Zion Leonahenahe Basque , Arslan Khan , Berkay Celik , Dave , Tian , Antonio Bianchi , Ruoyu , Wang , Dongyan Xu