Related papers: SimCLF: A Simple Contrastive Learning Framework fo…

Unsupervised Features Extraction for Binary Similarity Using Graph Embedding Neural Networks

In this paper we consider the binary similarity problem that consists in determining if two binary functions are similar only considering their compiled form. This problem is know to be crucial in several application scenarios, such as…

Machine Learning · Computer Science 2018-11-14 Roberto Baldoni , Giuseppe Antonio Di Luna , Luca Massarelli , Fabio Petroni , Leonardo Querzoni

Beyond Embeddings: Interpretable Feature Extraction for Binary Code Similarity

Binary code similarity detection is a core task in reverse engineering. It supports malware analysis and vulnerability discovery by identifying semantically similar code in different contexts. Modern methods have progressed from manually…

Artificial Intelligence · Computer Science 2025-09-30 Charles E. Gagnon , Steven H. H. Ding , Philippe Charland , Benjamin C. M. Fung

SAFE: Self-Attentive Function Embeddings for Binary Similarity

The binary similarity problem consists in determining if two functions are similar by only considering their compiled form. Advanced techniques for binary similarity recently gained momentum as they can be applied in several fields, such as…

Cryptography and Security · Computer Science 2019-12-20 Luca Massarelli , Giuseppe Antonio Di Luna , Fabio Petroni , Leonardo Querzoni , Roberto Baldoni

SemDiff: Binary Similarity Detection by Diffing Key-Semantics Graphs

Binary similarity detection is a critical technique that has been applied in many real-world scenarios where source code is not available, e.g., bug search, malware analysis, and code plagiarism detection. Existing works are ineffective in…

Cryptography and Security · Computer Science 2023-08-04 Zian Liu , Zhi Zhang , Siqi Ma , Dongxi Liu , Jun Zhang , Chao Chen , Shigang Liu , Muhammad Ejaz Ahmed , Yang Xiang

Binary analysis is a core component of many critical security tasks, including reverse engineering, malware analysis, and vulnerability detection. Manual analysis is often time-consuming, but identifying commonly-used or previously-seen…

Machine Learning · Computer Science 2024-10-31 Rebecca Saul , Chang Liu , Noah Fleischmann , Richard Zak , Kristopher Micinski , Edward Raff , James Holt

Functional Consistency of LLM Code Embeddings: A Self-Evolving Data Synthesis Framework for Benchmarking

Embedding models have demonstrated strong performance in tasks like clustering, retrieval, and feature extraction while offering computational advantages over generative models and cross-encoders. Benchmarks such as MTEB have shown that…

Software Engineering · Computer Science 2025-08-28 Zhuohao Li , Wenqing Chen , Jianxing Yu , Zhichao Lu

BinMatch: A Semantics-based Hybrid Approach on Binary Code Clone Analysis

Binary code clone analysis is an important technique which has a wide range of applications in software engineering (e.g., plagiarism detection, bug detection). The main challenge of the topic lies in the semantics-equivalent code…

Software Engineering · Computer Science 2018-08-21 Yikun Hu , Yuanyuan Zhang , Juanru Li , Hui Wang , Bodong Li , Dawu Gu

StriderSPD: Structure-Guided Joint Representation Learning for Binary Security Patch Detection

Vulnerabilities severely threaten software systems, making the timely application of security patches crucial for mitigating attacks. However, software vendors often silently patch vulnerabilities with limited disclosure, where Security…

Software Engineering · Computer Science 2026-01-12 Qingyuan Li , Chenchen Yu , Chuanyi Li , Xin-Cheng Wen , Cheryl Lee , Cuiyun Gao , Bin Luo

A Simple Framework for Contrastive Learning of Visual Representations

This paper presents SimCLR: a simple framework for contrastive learning of visual representations. We simplify recently proposed contrastive self-supervised learning algorithms without requiring specialized architectures or a memory bank.…

Machine Learning · Computer Science 2020-07-02 Ting Chen , Simon Kornblith , Mohammad Norouzi , Geoffrey Hinton

Binary Code Similarity Detection (BCSD) plays a crucial role in numerous fields, including vulnerability detection, malware analysis, and code reuse identification. As IoT devices proliferate and rapidly evolve, their highly heterogeneous…

Software Engineering · Computer Science 2024-10-25 Xiuwei Shang , Li Hu , Shaoyin Cheng , Guoqiang Chen , Benlong Wu , Weiming Zhang , Nenghai Yu

Fool Me If You Can: On the Robustness of Binary Code Similarity Detection Models against Semantics-preserving Transformations

Binary code analysis plays an essential role in cybersecurity, facilitating reverse engineering to reveal the inner workings of programs in the absence of source code. Traditional approaches, such as static and dynamic analysis, extract…

Cryptography and Security · Computer Science 2026-02-16 Jiyong Uhm , Minseok Kim , Michalis Polychronakis , Hyungjoon Koo

Binary Diff Summarization using Large Language Models

Security of software supply chains is necessary to ensure that software updates do not contain maliciously injected code or introduce vulnerabilities that may compromise the integrity of critical infrastructure. Verifying the integrity of…

Cryptography and Security · Computer Science 2025-09-30 Meet Udeshi , Venkata Sai Charan Putrevu , Prashanth Krishnamurthy , Prashant Anantharaman , Sean Carrick , Ramesh Karri , Farshad Khorrami

UniASM: Binary Code Similarity Detection without Fine-tuning

Binary code similarity detection (BCSD) is widely used in various binary analysis tasks such as vulnerability search, malware detection, clone detection, and patch analysis. Recent studies have shown that the learning-based binary code…

Cryptography and Security · Computer Science 2025-02-21 Yeming Gu , Hui Shu , Fei Kang , Fan Hu

Unsupervised Detection of Fraudulent Transactions in E-commerce Using Contrastive Learning

With the rapid development of e-commerce, e-commerce platforms are facing an increasing number of fraud threats. Effectively identifying and preventing these fraudulent activities has become a critical research problem. Traditional fraud…

Machine Learning · Computer Science 2025-03-25 Xuan Li , Yuting Peng , Xiaoxuan Sun , Yifei Duan , Zhou Fang , Tengda Tang

A Semantics-Based Hybrid Approach on Binary Code Similarity Comparison

Binary code similarity comparison is a methodology for identifying similar or identical code fragments in binary programs. It is indispensable in fields of software engineering and security, which has many important applications (e.g.,…

Cryptography and Security · Computer Science 2019-07-03 Yikun Hu , Hui Wang , Yuanyuan Zhang , Bodong Li , Dawu Gu

ReSIM: Re-ranking Binary Similarity Embeddings to Improve Function Search Performance

Binary Function Similarity (BFS), the problem of determining whether two binary functions originate from the same source code, has been extensively studied in recent research across security, software engineering, and machine learning…

Cryptography and Security · Computer Science 2026-02-24 Gianluca Capozzi , Anna Paola Giancaspro , Fabio Petroni , Leonardo Querzoni , Giuseppe Antonio Di Luna

FuncFooler: A Practical Black-box Attack Against Learning-based Binary Code Similarity Detection Methods

The binary code similarity detection (BCSD) method measures the similarity of two binary executable codes. Recently, the learning-based BCSD methods have achieved great success, outperforming traditional BCSD in detection accuracy and…

Cryptography and Security · Computer Science 2022-08-31 Lichen Jia , Bowen Tang , Chenggang Wu , Zhe Wang , Zihan Jiang , Yuanming Lai , Yan Kang , Ning Liu , Jingfeng Zhang

Compound Figure Separation of Biomedical Images: Mining Large Datasets for Self-supervised Learning

With the rapid development of self-supervised learning (e.g., contrastive learning), the importance of having large-scale images (even without annotations) for training a more generalizable AI model has been widely recognized in medical…

Computer Vision and Pattern Recognition · Computer Science 2022-08-31 Tianyuan Yao , Chang Qu , Jun Long , Quan Liu , Ruining Deng , Yuanhan Tian , Jiachen Xu , Aadarsh Jha , Zuhayr Asad , Shunxing Bao , Mengyang Zhao , Agnes B. Fogo , Bennett A. Landman , Haichun Yang , Catie Chang , Yuankai Huo

Semantic-Aware Contrastive Fine-Tuning: Boosting Multimodal Malware Classification with Discriminative Embeddings

The rapid evolution of malware variants requires robust classification methods to enhance cybersecurity. While Large Language Models (LLMs) offer potential for generating malware descriptions to aid family classification, their utility is…

Cryptography and Security · Computer Science 2025-05-01 Ivan Montoya Sanchez , Shaswata Mitra , Aritran Piplai , Sudip Mittal

CCLF: A Contrastive-Curiosity-Driven Learning Framework for Sample-Efficient Reinforcement Learning

In reinforcement learning (RL), it is challenging to learn directly from high-dimensional observations, where data augmentation has recently been shown to remedy this via encoding invariances from raw pixels. Nevertheless, we empirically…

Machine Learning · Computer Science 2023-12-20 Chenyu Sun , Hangwei Qian , Chunyan Miao