English
Related papers

Related papers: Extending Source Code Pre-Trained Language Models …

200 papers

In software reverse engineering, decompilation is the process of recovering source code from binary files. Decompilers are used when it is necessary to understand or analyze software for which the source code is not available. Although…

Software Engineering · Computer Science 2021-02-25 Javier Escalada , Ted Scully , Francisco Ortin

Much software, whether beneficent or malevolent, is distributed only as binaries, sans source code. Absent source code, understanding binaries' behavior can be quite challenging, especially when compiled under higher levels of compiler…

Software Engineering · Computer Science 2021-09-20 Toufique Ahmed , Premkumar Devanbu , Anand Ashok Sawant

Security experts reverse engineer (decompile) binary code to identify critical security vulnerabilities. The limited access to source code in vital systems - such as firmware, drivers, and proprietary software used in Critical…

Cryptography and Security · Computer Science 2024-11-08 Dylan Manuel , Nafis Tanveer Islam , Joseph Khoury , Ana Nunez , Elias Bou-Harb , Peyman Najafirad

Reverse engineering of binary executables is a critical problem in the computer security domain. On the one hand, malicious parties may recover interpretable source codes from the software products to gain commercial advantages. On the…

Programming Languages · Computer Science 2019-07-01 Cheng Fu , Huili Chen , Haolan Liu , Xinyun Chen , Yuandong Tian , Farinaz Koushanfar , Jishen Zhao

Decompilers are useful tools used in reverse engineering to understand compiled source code. Reconstructing source code from compiled binaries is a challenging task, because high-level syntax, identifiers, and custom data types are…

Software Engineering · Computer Science 2026-05-13 Alexander Shypula , Osbert Bastani , Edward Schwartz

Decompilation aims to convert binary code to high-level source code, but traditional tools like Ghidra often produce results that are difficult to read and execute. Motivated by the advancements in Large Language Models (LLMs), we propose…

Programming Languages · Computer Science 2025-08-06 Hanzhuo Tan , Qi Luo , Jing Li , Yuqun Zhang

Code summarization is a critical task in natural language processing and software engineering, which aims to generate concise descriptions of source code. Recent advancements have improved the quality of these summaries, enhancing code…

Computation and Language · Computer Science 2025-02-25 Vladimir Makharev , Vladimir Ivanov

Understanding binary code is an essential but complex software engineering task for reverse engineering, malware analysis, and compiler optimization. Unlike source code, binary code has limited semantic information, which makes it…

Software Engineering · Computer Science 2022-10-12 Yifan Zhang

The goal of decompilation is to convert compiled low-level code (e.g., assembly code) back into high-level programming languages, enabling analysis in scenarios where source code is unavailable. This task supports various reverse…

Software Engineering · Computer Science 2025-02-19 Yunlong Feng , Bohan Li , Xiaoming Shi , Qingfu Zhu , Wanxiang Che

Binary rewriting is a rapidly-maturing technique for modifying software for instrumentation, customization, optimization, and hardening without access to source code. Unfortunately, the practical applications of binary rewriting tools are…

Software Engineering · Computer Science 2022-09-09 Eric Schulte , Michael D. Brown , Vlad Folts

Binary code similarity detection is a core task in reverse engineering. It supports malware analysis and vulnerability discovery by identifying semantically similar code in different contexts. Modern methods have progressed from manually…

Artificial Intelligence · Computer Science 2025-09-30 Charles E. Gagnon , Steven H. H. Ding , Philippe Charland , Benjamin C. M. Fung

Decompilation is widely used in reverse engineering to recover high-level language code from binary executables. While recent approaches leveraging Large Language Models (LLMs) have shown promising progress, they typically treat assembly…

Software Engineering · Computer Science 2025-09-19 Yongpan Wang , Xin Xu , Xiaojie Zhu , Xiaodong Gu , Beijun Shen

Recent advances in LLM-based decompilers have been shown effective to convert low-level binaries into human-readable source code. However, there still lacks a comprehensive benchmark that provides large-scale binary-source function pairs,…

Software Engineering · Computer Science 2025-10-21 Hanzhuo Tan , Xiaolong Tian , Hanrui Qi , Jiaming Liu , Zuchen Gao , Siyi Wang , Qi Luo , Jing Li , Yuqun Zhang

Binary decompilation is a critical reverse engineering task aimed at reconstructing high-level source code from stripped executables. Although Large Language Models (LLMs) have recently shown promise, they often suffer from "logical…

Software Engineering · Computer Science 2026-04-15 Qiang Zhang , Zhongnian Li

Binary code analysis and comprehension is critical to applications in reverse engineering and computer security tasks where source code is not available. Unfortunately, unlike source code, binary code lacks semantics and is more difficult…

Software Engineering · Computer Science 2025-09-29 Yifan Zhang , Chen Huang , Yueke Zhang , Huajie Shao , Kevin Leach , Yu Huang

Binary code analysis plays a pivotal role in the field of software security and is widely used in tasks such as software maintenance, malware detection, software vulnerability discovery, patch analysis, etc. However, unlike source code,…

Software Engineering · Computer Science 2025-05-01 Xiuwei Shang , Zhenkan Fu , Shaoyin Cheng , Guoqiang Chen , Gangyang Li , Li Hu , Weiming Zhang , Nenghai Yu

Binary decompilation aims to recover binaries into high-level source code, but existing evaluations mainly rely on syntactic similarity or single-axis readability metrics, which fail to capture practical reusability. We propose a…

Software Engineering · Computer Science 2026-05-29 Puzhuo Liu , Yuhan Huang , Jianlei Chi , Peng Di , Yu Jiang

The software compilation process has a tendency to obscure the original design of the system and makes it difficult both to identify individual components and discern their purpose simply by examining the resulting binary code. Although…

Cryptography and Security · Computer Science 2025-03-07 Sima Arasteh , Pegah Jandaghi , Nicolaas Weideman , Dennis Perepech , Mukund Raghothaman , Christophe Hauser , Luis Garcia

Descriptive comments play a crucial role in the software engineering process. They decrease development time, enable better bug detection, and facilitate the reuse of previously written code. However, comments are commonly the last of a…

Computation and Language · Computer Science 2019-04-02 Jessica Moore , Ben Gelman , David Slater

Source code summarizing is a task of writing short, natural language descriptions of source code behavior during run time. Such summaries are extremely useful for software development and maintenance but are expensive to manually…

Machine Learning · Computer Science 2020-04-03 Vivek Gupta
‹ Prev 1 2 3 10 Next ›