Related papers: Enhancing Function Name Prediction using Votes-Bas…

BLens: Contrastive Captioning of Binary Functions using Ensemble Embedding

Function names can greatly aid human reverse engineers, which has spurred the development of machine learning-based approaches to predicting function names in stripped binaries. Much current work in this area now uses transformers, applying…

Machine Learning · Computer Science 2025-02-04 Tristan Benoit , Yunru Wang , Moritz Dannehl , Johannes Kinder

XFL: Naming Functions in Binaries with Extreme Multi-label Learning

Reverse engineers benefit from the presence of identifiers such as function names in a binary, but usually these are removed for release. Training a machine learning model to predict function names automatically is promising but…

Cryptography and Security · Computer Science 2022-12-13 James Patrick-Evans , Moritz Dannehl , Johannes Kinder

AGNOMIN -- Architecture Agnostic Multi-Label Function Name Prediction

Function name prediction is crucial for understanding stripped binaries in software reverse engineering, a key step for \textbf{enabling subsequent vulnerability analysis and patching}. However, existing approaches often struggle with…

Software Engineering · Computer Science 2025-10-02 Yonatan Gizachew Achamyeleh , Tongtao Zhang , Joshua Hyunki Kim , Gabriel Garcia , Shih-Yuan Yu , Anton Kocheturov , Mohammad Abdullah Al Faruque

In Nomine Function: Naming Functions in Stripped Binaries with Neural Networks

In this paper we investigate the problem of automatically naming pieces of assembly code. Where by naming we mean assigning to an assembly function a string of words that would likely be assigned by a human reverse engineer. We formally and…

Machine Learning · Computer Science 2021-02-05 Fiorella Artuso , Giuseppe Antonio Di Luna , Luca Massarelli , Leonardo Querzoni

Foundation Model is Efficient Multimodal Multitask Model Selector

This paper investigates an under-explored but important problem: given a collection of pre-trained neural networks, predicting their performance on each multi-modal task without fine-tuning them, such as image recognition, referring,…

Machine Learning · Computer Science 2023-08-14 Fanqing Meng , Wenqi Shao , Zhanglin Peng , Chonghe Jiang , Kaipeng Zhang , Yu Qiao , Ping Luo

Software Ethology: An Accurate, Resilient, and Cross-Architecture Binary Analysis Framework

When reverse engineering a binary, the analyst must first understand the semantics of the binary's functions through either manual or automatic analysis. Manual semantic analysis is time-consuming, because abstractions provided by high…

Cryptography and Security · Computer Science 2020-07-02 Derrick McKee , Nathan Burow , Mathias Payer

Function Basis Encoding of Numerical Features in Factorization Machines

Factorization machine (FM) variants are widely used for large scale real-time content recommendation systems, since they offer an excellent balance between model accuracy and low computational costs for training and inference. These systems…

Machine Learning · Computer Science 2025-01-03 Alex Shtoff , Elie Abboud , Rotem Stram , Oren Somekh

TaskEval: Synthesised Evaluation for Foundation-Model Tasks

Hallucinations are a key concern when creating applications that rely on Foundation models (FMs). Understanding where and how these subtle failures occur in an application relies on evaluation methods known as \textit{evals}. Prior work…

Artificial Intelligence · Computer Science 2025-12-08 Dilani Widanapathiranage , Scott Barnett , Stefanus Kurniawan , Wannita Takerngsaksiri

A Vocabulary-Free Multilingual Neural Tokenizer for End-to-End Task Learning

Subword tokenization is a commonly used input pre-processing step in most recent NLP models. However, it limits the models' ability to leverage end-to-end task learning. Its frequency-based vocabulary creation compromises tokenization in…

Computation and Language · Computer Science 2022-04-25 Md Mofijul Islam , Gustavo Aguilar , Pragaash Ponnusamy , Clint Solomon Mathialagan , Chengyuan Ma , Chenlei Guo

A New Citation Recommendation Strategy Based on Term Functions in Related Studies Section

Purpose: Researchers frequently encounter the following problems when writing scientific articles: (1) Selecting appropriate citations to support the research idea is challenging. (2) The literature review is not conducted extensively,…

Information Retrieval · Computer Science 2021-05-04 Haihua Chen

Task Tokens: A Flexible Approach to Adapting Behavior Foundation Models

Recent advancements in imitation learning have led to transformer-based behavior foundation models (BFMs) that enable multi-modal, human-like control for humanoid agents. While excelling at zero-shot generation of robust behaviors, BFMs…

Machine Learning · Computer Science 2026-03-30 Ron Vainshtein , Zohar Rimon , Shie Mannor , Chen Tessler

Path-Based Function Embedding and its Application to Specification Mining

Identifying the relationships among program elements is useful for program understanding, debugging, and analysis. One such relationship is synonymy. Function synonyms are functions that play a similar role in code, e.g. functions that…

Software Engineering · Computer Science 2018-02-27 Daniel DeFreez , Aditya V. Thakur , Cindy Rubio-González

Symbol Preference Aware Generative Models for Recovering Variable Names from Stripped Binary

Decompilation aims to recover the source code form of a binary executable. It has many security applications, such as malware analysis, vulnerability detection, and code hardening. A prominent challenge in decompilation is to recover…

Software Engineering · Computer Science 2024-12-10 Xiangzhe Xu , Zhuo Zhang , Zian Su , Ziyang Huang , Shiwei Feng , Yapeng Ye , Nan Jiang , Danning Xie , Siyuan Cheng , Lin Tan , Xiangyu Zhang

Practice in Synonym Extraction at Large Scale

Synonym extraction is an important task in natural language processing and often used as a submodule in query expansion, question answering and other applications. Automatic synonym extractor is highly preferred for large scale…

Computation and Language · Computer Science 2015-06-02 Liangliang Cao , Chang Wang

How Does That Sound? Multi-Language SpokenName2Vec Algorithm Using Speech Generation and Deep Learning

Searching for information about a specific person is an online activity frequently performed by many users. In most cases, users are aided by queries containing a name and sending back to the web search engines for finding their will.…

Computation and Language · Computer Science 2020-07-23 Aviad Elyashar , Rami Puzis , Michael Fire

Renovating Names in Open-Vocabulary Segmentation Benchmarks

Names are essential to both human cognition and vision-language models. Open-vocabulary models utilize class names as text prompts to generalize to categories unseen during training. However, the precision of these names is often overlooked…

Computer Vision and Pattern Recognition · Computer Science 2024-05-27 Haiwen Huang , Songyou Peng , Dan Zhang , Andreas Geiger

ReF Decompile: Relabeling and Function Call Enhanced Decompile

The goal of decompilation is to convert compiled low-level code (e.g., assembly code) back into high-level programming languages, enabling analysis in scenarios where source code is unavailable. This task supports various reverse…

Software Engineering · Computer Science 2025-02-19 Yunlong Feng , Bohan Li , Xiaoming Shi , Qingfu Zhu , Wanxiang Che

Multi-task Learning based Pre-trained Language Model for Code Completion

Code completion is one of the most useful features in the Integrated Development Environments (IDEs), which can accelerate software development by suggesting the next probable token based on the contextual code in real-time. Recent studies…

Software Engineering · Computer Science 2021-01-01 Fang Liu , Ge Li , Yunfei Zhao , Zhi Jin

ReSIM: Re-ranking Binary Similarity Embeddings to Improve Function Search Performance

Binary Function Similarity (BFS), the problem of determining whether two binary functions originate from the same source code, has been extensively studied in recent research across security, software engineering, and machine learning…

Cryptography and Security · Computer Science 2026-02-24 Gianluca Capozzi , Anna Paola Giancaspro , Fabio Petroni , Leonardo Querzoni , Giuseppe Antonio Di Luna

ProTranslator: zero-shot protein function prediction using textual description

Accurately finding proteins and genes that have a certain function is the prerequisite for a broad range of biomedical applications. Despite the encouraging progress of existing computational approaches in protein function prediction, it…

Quantitative Methods · Quantitative Biology 2022-04-22 Hanwen Xu , Sheng Wang