Related papers: Semantic-aware Binary Code Representation with BER…

Estimation of semantic similarity is an important research problem both in natural language processing and the natural language understanding, and that has tremendous application on various downstream tasks such as question answering,…

Computation and Language · Computer Science 2025-06-24 R. Prashanth

PEM: Representing Binary Program Semantics for Similarity Analysis via a Probabilistic Execution Model

Binary similarity analysis determines if two binary executables are from the same source program. Existing techniques leverage static and dynamic program features and may utilize advanced Deep Learning techniques. Although they have…

Software Engineering · Computer Science 2023-08-31 Xiangzhe Xu , Zhou Xuan , Shiwei Feng , Siyuan Cheng , Yapeng Ye , Qingkai Shi , Guanhong Tao , Le Yu , Zhuo Zhang , Xiangyu Zhang

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional…

Computation and Language · Computer Science 2019-05-28 Jacob Devlin , Ming-Wei Chang , Kenton Lee , Kristina Toutanova

BinBert: Binary Code Understanding with a Fine-tunable and Execution-aware Transformer

A recent trend in binary code analysis promotes the use of neural solutions based on instruction embedding models. An instruction embedding model is a neural network that transforms sequences of assembly instructions into embedding vectors.…

Cryptography and Security · Computer Science 2022-08-16 Fiorella Artuso , Marco Mormando , Giuseppe A. Di Luna , Leonardo Querzoni

Semantics-aware BERT for Language Understanding

The latest work on language representations carefully integrates contextualized features into language model training, which enables a series of success especially in various machine reading comprehension and natural language inference…

Computation and Language · Computer Science 2020-02-05 Zhuosheng Zhang , Yuwei Wu , Hai Zhao , Zuchao Li , Shuailiang Zhang , Xi Zhou , Xiang Zhou

Unsupervised Binary Code Translation with Application to Code Similarity Detection and Vulnerability Discovery

Binary code analysis has immense importance in the research domain of software security. Today, software is very often compiled for various Instruction Set Architectures (ISAs). As a result, cross-architecture binary code analysis has…

Software Engineering · Computer Science 2024-05-01 Iftakhar Ahmad , Lannan Luo

Transfer Fine-Tuning: A BERT Case Study

A semantic equivalence assessment is defined as a task that assesses semantic equivalence in a sentence pair by binary judgment (i.e., paraphrase identification) or grading (i.e., semantic textual similarity measurement). It constitutes a…

Computation and Language · Computer Science 2022-10-24 Yuki Arase , Junichi Tsujii

A Semantics-Based Hybrid Approach on Binary Code Similarity Comparison

Binary code similarity comparison is a methodology for identifying similar or identical code fragments in binary programs. It is indispensable in fields of software engineering and security, which has many important applications (e.g.,…

Cryptography and Security · Computer Science 2019-07-03 Yikun Hu , Hui Wang , Yuanyuan Zhang , Bodong Li , Dawu Gu

BERT-based Ranking for Biomedical Entity Normalization

Developing high-performance entity normalization algorithms that can alleviate the term variation problem is of great interest to the biomedical community. Although deep learning-based methods have been successfully applied to biomedical…

Information Retrieval · Computer Science 2019-08-12 Zongcheng Ji , Qiang Wei , Hua Xu

Beyond Embeddings: Interpretable Feature Extraction for Binary Code Similarity

Binary code similarity detection is a core task in reverse engineering. It supports malware analysis and vulnerability discovery by identifying semantically similar code in different contexts. Modern methods have progressed from manually…

Artificial Intelligence · Computer Science 2025-09-30 Charles E. Gagnon , Steven H. H. Ding , Philippe Charland , Benjamin C. M. Fung

A text autoencoder from transformer for fast encoding language representation

In recent years BERT shows apparent advantages and great potential in natural language processing tasks. However, both training and applying BERT requires intensive time and resources for computing contextual language representations, which…

Computation and Language · Computer Science 2021-11-05 Tan Huang

BinMatch: A Semantics-based Hybrid Approach on Binary Code Clone Analysis

Binary code clone analysis is an important technique which has a wide range of applications in software engineering (e.g., plagiarism detection, bug detection). The main challenge of the topic lies in the semantics-equivalent code…

Software Engineering · Computer Science 2018-08-21 Yikun Hu , Yuanyuan Zhang , Juanru Li , Hui Wang , Bodong Li , Dawu Gu

Establishing Strong Baselines for the New Decade: Sequence Tagging, Syntactic and Semantic Parsing with BERT

This paper presents new state-of-the-art models for three tasks, part-of-speech tagging, syntactic parsing, and semantic parsing, using the cutting-edge contextualized embedding framework known as BERT. For each task, we first replicate and…

Computation and Language · Computer Science 2020-05-26 Han He , Jinho D. Choi

SBERT studies Meaning Representations: Decomposing Sentence Embeddings into Explainable Semantic Features

Models based on large-pretrained language models, such as S(entence)BERT, provide effective and efficient sentence embeddings that show high correlation to human similarity ratings, but lack interpretability. On the other hand, graph…

Computation and Language · Computer Science 2025-10-17 Juri Opitz , Anette Frank

Semantic Learning and Emulation Based Cross-platform Binary Vulnerability Seeker

Clone detection is widely exploited for software vulnerability search. The approaches based on source code analysis cannot be applied to binary clone detection because the same source code can produce significantly different binaries. In…

Cryptography and Security · Computer Science 2022-11-11 Jian Gao , Yu Jiang , Zhe Liu , Xin Yang , Cong Wang , Xun Jiao , Zijiang Yang , Jiaguang Sun

Positional Attention-based Frame Identification with BERT: A Deep Learning Approach to Target Disambiguation and Semantic Frame Selection

Semantic parsing is the task of transforming sentences from natural language into formal representations of predicate-argument structures. Under this research area, frame-semantic parsing has attracted much interest. This parsing approach…

Computation and Language · Computer Science 2019-11-01 Sang-Sang Tan , Jin-Cheon Na

Exploring the Role of BERT Token Representations to Explain Sentence Probing Results

Several studies have been carried out on revealing linguistic features captured by BERT. This is usually achieved by training a diagnostic classifier on the representations obtained from different layers of BERT. The subsequent…

Computation and Language · Computer Science 2021-09-14 Hosein Mohebbi , Ali Modarressi , Mohammad Taher Pilehvar

An Effective Contextual Language Modeling Framework for Speech Summarization with Augmented Features

Tremendous amounts of multimedia associated with speech information are driving an urgent need to develop efficient and effective automatic summarization methods. To this end, we have seen rapid progress in applying supervised deep neural…

Computation and Language · Computer Science 2020-06-03 Shi-Yan Weng , Tien-Hong Lo , Berlin Chen

FAT ALBERT: Finding Answers in Large Texts using Semantic Similarity Attention Layer based on BERT

Machine based text comprehension has always been a significant research field in natural language processing. Once a full understanding of the text context and semantics is achieved, a deep learning model can be trained to solve a large…

Computation and Language · Computer Science 2020-09-03 Omar Mossad , Amgad Ahmed , Anandharaju Raju , Hari Karthikeyan , Zayed Ahmed

DABERT: Dual Attention Enhanced BERT for Semantic Matching

Transformer-based pre-trained language models such as BERT have achieved remarkable results in Semantic Sentence Matching. However, existing models still suffer from insufficient ability to capture subtle differences. Minor noise like word…

Computation and Language · Computer Science 2023-04-17 Sirui Wang , Di Liang , Jian Song , Yuntao Li , Wei Wu