English
Related papers

Related papers: A Cross-Architecture Instruction Embedding Model f…

200 papers

Binary code analysis allows analyzing binary code without having access to the corresponding source code. A binary, after disassembly, is expressed in an assembly language. This inspires us to approach binary analysis by leveraging ideas…

Software Engineering · Computer Science 2018-12-18 Fei Zuo , Xiaopeng Li , Patrick Young , Lannan Luo , Qiang Zeng , Zhexin Zhang

Binary code similarity detection is a core task in reverse engineering. It supports malware analysis and vulnerability discovery by identifying semantically similar code in different contexts. Modern methods have progressed from manually…

Artificial Intelligence · Computer Science 2025-09-30 Charles E. Gagnon , Steven H. H. Ding , Philippe Charland , Benjamin C. M. Fung

A recent trend in binary code analysis promotes the use of neural solutions based on instruction embedding models. An instruction embedding model is a neural network that transforms sequences of assembly instructions into embedding vectors.…

Cryptography and Security · Computer Science 2022-08-16 Fiorella Artuso , Marco Mormando , Giuseppe A. Di Luna , Leonardo Querzoni

Bilingual word embeddings have been widely used to capture the similarity of lexical semantics in different human languages. However, many applications, such as cross-lingual semantic search and question answering, can be largely benefited…

Computation and Language · Computer Science 2019-09-10 Muhao Chen , Yingtao Tian , Haochen Chen , Kai-Wei Chang , Steven Skiena , Carlo Zaniolo

Binary code analysis has immense importance in the research domain of software security. Today, software is very often compiled for various Instruction Set Architectures (ISAs). As a result, cross-architecture binary code analysis has…

Software Engineering · Computer Science 2024-05-01 Iftakhar Ahmad , Lannan Luo

Understanding binary code is an essential but complex software engineering task for reverse engineering, malware analysis, and compiler optimization. Unlike source code, binary code has limited semantic information, which makes it…

Software Engineering · Computer Science 2022-10-12 Yifan Zhang

Cross-architecture binary similarity comparison is essential in many security applications. Recently, researchers have proposed learning-based approaches to improve comparison performance. They adopted a paradigm of instruction…

Cryptography and Security · Computer Science 2022-06-29 Qige Song , Yongzheng Zhang , Shuhao Li

Binary code similarity detection is to detect the similarity of code at binary (assembly) level without source code. Existing works have their limitations when dealing with mutated binary code generated by different compiling options. In…

Cryptography and Security · Computer Science 2023-08-08 Zian Liu

Binary code analysis plays a pivotal role in the field of software security and is widely used in tasks such as software maintenance, malware detection, software vulnerability discovery, patch analysis, etc. However, unlike source code,…

Software Engineering · Computer Science 2025-05-01 Xiuwei Shang , Zhenkan Fu , Shaoyin Cheng , Guoqiang Chen , Gangyang Li , Li Hu , Weiming Zhang , Nenghai Yu

Matching binary to source code and vice versa has various applications in different fields, such as computer security, software engineering, and reverse engineering. Even though there exist methods that try to match source code with binary…

Software Engineering · Computer Science 2023-04-11 Ali TehraniJamsaz , Hanze Chen , Ali Jannesari

Binary code similarity comparison is a methodology for identifying similar or identical code fragments in binary programs. It is indispensable in fields of software engineering and security, which has many important applications (e.g.,…

Cryptography and Security · Computer Science 2019-07-03 Yikun Hu , Hui Wang , Yuanyuan Zhang , Bodong Li , Dawu Gu

Natural Language Processing (NLP), a cornerstone field within artificial intelligence, has been increasingly utilized in the field of materials science literature. Our study conducts a reproducibility analysis of two pioneering works within…

Chemical Physics · Physics 2023-08-01 Xiangyun Lei , Edward Kim , Viktoriia Baibakova , Shijing Sun

Natural language processing has improved tremendously after the success of word embedding techniques such as word2vec. Recently, the same idea has been applied on source code with encouraging results. In this survey, we aim to collect and…

Machine Learning · Computer Science 2019-04-08 Zimin Chen , Martin Monperrus

The advent of large language models (LLMs) has significantly advanced artificial intelligence (AI) in software engineering (SE), with source code embeddings playing a crucial role in tasks such as source code clone detection and source code…

Software Engineering · Computer Science 2025-06-04 Zixiang Xian , Chenhui Cui , Rubing Huang , Chunrong Fang , Zhenyu Chen

Machine learning models that take computer program source code as input typically use Natural Language Processing (NLP) techniques. However, a major challenge is that code is written using an open, rapidly changing vocabulary due to, e.g.,…

Machine Learning · Computer Science 2019-05-21 Milan Cvitkovic , Badal Singh , Anima Anandkumar

In recent years, natural language processing (NLP) has become integral to educational data mining, particularly in the analysis of student-generated language products. For research and assessment purposes, so-called embedding models are…

Computation and Language · Computer Science 2025-10-23 Tom Bleckmann , Paul Tschisgale

Binary analysis of software is a critical step in cyber forensics applications such as program vulnerability assessment and malware detection. This involves interpreting instructions executed by software and often necessitates converting…

Cryptography and Security · Computer Science 2022-04-15 Dinuka Sahabandu , Sukarno Mertoguno , Radha Poovendran

Distributional models that learn rich semantic word representations are a success story of recent NLP research. However, developing models that learn useful representations of phrases and sentences has proved far harder. We propose using…

Computation and Language · Computer Science 2016-03-23 Felix Hill , Kyunghyun Cho , Anna Korhonen , Yoshua Bengio

Binary-source code matching plays an important role in many security and software engineering related tasks such as malware detection, reverse engineering and vulnerability assessment. Currently, several approaches have been proposed for…

Software Engineering · Computer Science 2022-01-20 Yi Gui , Yao Wan , Hongyu Zhang , Huifang Huang , Yulei Sui , Guandong Xu , Zhiyuan Shao , Hai Jin

Natural Language Processing (NLP) systems commonly leverage bag-of-words co-occurrence techniques to capture semantic and syntactic word relationships. The resulting word-level distributed representations often ignore morphological…

Computation and Language · Computer Science 2015-06-12 Andrew Trask , David Gilmore , Matthew Russell
‹ Prev 1 2 3 10 Next ›