Related papers: Enhancing Function Name Prediction using Votes-Bas…
Function names can greatly aid human reverse engineers, which has spurred the development of machine learning-based approaches to predicting function names in stripped binaries. Much current work in this area now uses transformers, applying…
Reverse engineers benefit from the presence of identifiers such as function names in a binary, but usually these are removed for release. Training a machine learning model to predict function names automatically is promising but…
Function name prediction is crucial for understanding stripped binaries in software reverse engineering, a key step for \textbf{enabling subsequent vulnerability analysis and patching}. However, existing approaches often struggle with…
In this paper we investigate the problem of automatically naming pieces of assembly code. Where by naming we mean assigning to an assembly function a string of words that would likely be assigned by a human reverse engineer. We formally and…
This paper investigates an under-explored but important problem: given a collection of pre-trained neural networks, predicting their performance on each multi-modal task without fine-tuning them, such as image recognition, referring,…
When reverse engineering a binary, the analyst must first understand the semantics of the binary's functions through either manual or automatic analysis. Manual semantic analysis is time-consuming, because abstractions provided by high…
Factorization machine (FM) variants are widely used for large scale real-time content recommendation systems, since they offer an excellent balance between model accuracy and low computational costs for training and inference. These systems…
Hallucinations are a key concern when creating applications that rely on Foundation models (FMs). Understanding where and how these subtle failures occur in an application relies on evaluation methods known as \textit{evals}. Prior work…
Subword tokenization is a commonly used input pre-processing step in most recent NLP models. However, it limits the models' ability to leverage end-to-end task learning. Its frequency-based vocabulary creation compromises tokenization in…
Purpose: Researchers frequently encounter the following problems when writing scientific articles: (1) Selecting appropriate citations to support the research idea is challenging. (2) The literature review is not conducted extensively,…
Recent advancements in imitation learning have led to transformer-based behavior foundation models (BFMs) that enable multi-modal, human-like control for humanoid agents. While excelling at zero-shot generation of robust behaviors, BFMs…
Identifying the relationships among program elements is useful for program understanding, debugging, and analysis. One such relationship is synonymy. Function synonyms are functions that play a similar role in code, e.g. functions that…
Decompilation aims to recover the source code form of a binary executable. It has many security applications, such as malware analysis, vulnerability detection, and code hardening. A prominent challenge in decompilation is to recover…
Synonym extraction is an important task in natural language processing and often used as a submodule in query expansion, question answering and other applications. Automatic synonym extractor is highly preferred for large scale…
Searching for information about a specific person is an online activity frequently performed by many users. In most cases, users are aided by queries containing a name and sending back to the web search engines for finding their will.…
Names are essential to both human cognition and vision-language models. Open-vocabulary models utilize class names as text prompts to generalize to categories unseen during training. However, the precision of these names is often overlooked…
The goal of decompilation is to convert compiled low-level code (e.g., assembly code) back into high-level programming languages, enabling analysis in scenarios where source code is unavailable. This task supports various reverse…
Code completion is one of the most useful features in the Integrated Development Environments (IDEs), which can accelerate software development by suggesting the next probable token based on the contextual code in real-time. Recent studies…
Binary Function Similarity (BFS), the problem of determining whether two binary functions originate from the same source code, has been extensively studied in recent research across security, software engineering, and machine learning…
Accurately finding proteins and genes that have a certain function is the prerequisite for a broad range of biomedical applications. Despite the encouraging progress of existing computational approaches in protein function prediction, it…