Related papers: Variable Name Recovery in Decompiled Binary Code u…

Symbol Preference Aware Generative Models for Recovering Variable Names from Stripped Binary

Decompilation aims to recover the source code form of a binary executable. It has many security applications, such as malware analysis, vulnerability detection, and code hardening. A prominent challenge in decompilation is to recover…

Software Engineering · Computer Science 2024-12-10 Xiangzhe Xu , Zhuo Zhang , Zian Su , Ziyang Huang , Shiwei Feng , Yapeng Ye , Nan Jiang , Danning Xie , Siyuan Cheng , Lin Tan , Xiangyu Zhang

DIRE: A Neural Approach to Decompiled Identifier Naming

The decompiler is one of the most common tools for examining binaries without corresponding source code. It transforms binaries into high-level code, reversing the compilation process. Decompilers can reconstruct much of the information…

Software Engineering · Computer Science 2019-10-04 Jeremy Lacomis , Pengcheng Yin , Edward J. Schwartz , Miltiadis Allamanis , Claire Le Goues , Graham Neubig , Bogdan Vasilescu

Augmenting Decompiler Output with Learned Variable Names and Types

A common tool used by security professionals for reverse-engineering binaries found in the wild is the decompiler. A decompiler attempts to reverse compilation, transforming a binary to a higher-level language such as C. High-level…

Software Engineering · Computer Science 2021-08-17 Qibin Chen , Jeremy Lacomis , Edward J. Schwartz , Claire Le Goues , Graham Neubig , Bogdan Vasilescu

Improving type information inferred by decompilers with supervised machine learning

In software reverse engineering, decompilation is the process of recovering source code from binary files. Decompilers are used when it is necessary to understand or analyze software for which the source code is not available. Although…

Software Engineering · Computer Science 2021-02-25 Javier Escalada , Ted Scully , Francisco Ortin

VarCLR: Variable Semantic Representation Pre-training via Contrastive Learning

Variable names are critical for conveying intended program behavior. Machine learning-based program analysis methods use variable name representations for a wide range of tasks, such as suggesting new variable names and bug detection.…

Software Engineering · Computer Science 2021-12-07 Qibin Chen , Jeremy Lacomis , Edward J. Schwartz , Graham Neubig , Bogdan Vasilescu , Claire Le Goues

RefBERT: A Two-Stage Pre-trained Framework for Automatic Rename Refactoring

Refactoring is an indispensable practice of improving the quality and maintainability of source code in software evolution. Rename refactoring is the most frequently performed refactoring that suggests a new name for an identifier to…

Software Engineering · Computer Science 2023-05-30 Hao Liu , Yanlin Wang , Zhao Wei , Yong Xu , Juhong Wang , Hui Li , Rongrong Ji

Semantics-Recovering Decompilation through Neural Machine Translation

Decompilation transforms low-level program languages (PL) (e.g., binary code) into high-level PLs (e.g., C/C++). It has been widely used when analysts perform security analysis on software (systems) whose source code is unavailable, such as…

Cryptography and Security · Computer Science 2022-01-03 Ruigang Liang , Ying Cao , Peiwei Hu , Jinwen He , Kai Chen

Disentangling Representations of Text by Masking Transformers

Representations from large pretrained models such as BERT encode a range of features into monolithic vectors, affording strong predictive accuracy across a multitude of downstream tasks. In this paper we explore whether it is possible to…

Computation and Language · Computer Science 2021-09-14 Xiongyi Zhang , Jan-Willem van de Meent , Byron C. Wallace

ReF Decompile: Relabeling and Function Call Enhanced Decompile

The goal of decompilation is to convert compiled low-level code (e.g., assembly code) back into high-level programming languages, enabling analysis in scenarios where source code is unavailable. This task supports various reverse…

Software Engineering · Computer Science 2025-02-19 Yunlong Feng , Bohan Li , Xiaoming Shi , Qingfu Zhu , Wanxiang Che

Constraint-Guided Multi-Agent Decompilation for Executable Binary Recovery

Decompilation -- recovering source code from compiled binaries -- is essential for security analysis, malware reverse engineering, and legacy software maintenance. However, existing decompilers produce code that often fails to compile or…

Software Engineering · Computer Science 2026-05-05 Yifan Zhang , Xiaohan Wang , Yueke Zhang , Yu Huang , Kevin Leach

A Neural-based Program Decompiler

Reverse engineering of binary executables is a critical problem in the computer security domain. On the one hand, malicious parties may recover interpretable source codes from the software products to gain commercial advantages. On the…

Programming Languages · Computer Science 2019-07-01 Cheng Fu , Huili Chen , Haolan Liu , Xinyun Chen , Yuandong Tian , Farinaz Koushanfar , Jishen Zhao

Extending Source Code Pre-Trained Language Models to Summarise Decompiled Binaries

Reverse engineering binaries is required to understand and analyse programs for which the source code is unavailable. Decompilers can transform the largely unreadable binaries into a more readable source code-like representation. However,…

Cryptography and Security · Computer Science 2023-01-16 Ali Al-Kaswan , Toufique Ahmed , Maliheh Izadi , Anand Ashok Sawant , Premkumar Devanbu , Arie van Deursen

Can Neural Decompilation Assist Vulnerability Prediction on Binary Code?

Vulnerability prediction is valuable in identifying security issues efficiently, even though it requires the source code of the target software system, which is a restrictive hypothesis. This paper presents an experimental study to predict…

Cryptography and Security · Computer Science 2025-04-01 D. Cotroneo , F. C. Grasso , R. Natella , V. Orbinato

Towards Neural Decompilation

We address the problem of automatic decompilation, converting a program in low-level representation back to a higher-level human-readable programming language. The problem of decompilation is extremely important for security researchers.…

Programming Languages · Computer Science 2019-05-22 Omer Katz , Yuval Olshaker , Yoav Goldberg , Eran Yahav

Maybe Deep Neural Networks are the Best Choice for Modeling Source Code

Statistical language modeling techniques have successfully been applied to source code, yielding a variety of new software development tools, such as tools for code suggestion and improving readability. A major issue with these techniques…

Software Engineering · Computer Science 2019-03-15 Rafael-Michael Karampatsis , Charles Sutton

Trim My View: An LLM-Based Code Query System for Module Retrieval in Robotic Firmware

The software compilation process has a tendency to obscure the original design of the system and makes it difficult both to identify individual components and discern their purpose simply by examining the resulting binary code. Although…

Cryptography and Security · Computer Science 2025-03-07 Sima Arasteh , Pegah Jandaghi , Nicolaas Weideman , Dennis Perepech , Mukund Raghothaman , Christophe Hauser , Luis Garcia

Automated Vulnerability Detection in Source Code Using Deep Representation Learning

Each year, software vulnerabilities are discovered, which pose significant risks of exploitation and system compromise. We present a convolutional neural network model that can successfully identify bugs in C code. We trained our model…

Cryptography and Security · Computer Science 2026-02-27 C. Seas , G. Fitzpatrick , J. A. Hamilton , M. C. Carlisle

Learning to Find Usages of Library Functions in Optimized Binaries

Much software, whether beneficent or malevolent, is distributed only as binaries, sans source code. Absent source code, understanding binaries' behavior can be quite challenging, especially when compiled under higher levels of compiler…

Software Engineering · Computer Science 2021-09-20 Toufique Ahmed , Premkumar Devanbu , Anand Ashok Sawant

Semantic-aware Binary Code Representation with BERT

A wide range of binary analysis applications, such as bug discovery, malware analysis and code clone detection, require recovery of contextual meanings on a binary code. Recently, binary analysis techniques based on machine learning have…

Cryptography and Security · Computer Science 2021-06-11 Hyungjoon Koo , Soyeon Park , Daejin Choi , Taesoo Kim

DOBF: A Deobfuscation Pre-Training Objective for Programming Languages

Recent advances in self-supervised learning have dramatically improved the state of the art on a wide variety of tasks. However, research in language model pre-training has mostly focused on natural languages, and it is unclear whether…

Computation and Language · Computer Science 2021-10-29 Baptiste Roziere , Marie-Anne Lachaux , Marc Szafraniec , Guillaume Lample