Related papers: Path-Based Function Embedding and its Application …

Task2Vec: Task Embedding for Meta-Learning

We introduce a method to provide vectorial representations of visual classification tasks which can be used to reason about the nature of those tasks and their relations. Given a dataset with ground-truth labels and a loss function defined…

Machine Learning · Computer Science 2019-02-12 Alessandro Achille , Michael Lam , Rahul Tewari , Avinash Ravichandran , Subhransu Maji , Charless Fowlkes , Stefano Soatto , Pietro Perona

Functional Consistency of LLM Code Embeddings: A Self-Evolving Data Synthesis Framework for Benchmarking

Embedding models have demonstrated strong performance in tasks like clustering, retrieval, and feature extraction while offering computational advantages over generative models and cross-encoders. Benchmarks such as MTEB have shown that…

Software Engineering · Computer Science 2025-08-28 Zhuohao Li , Wenqing Chen , Jianxing Yu , Zhichao Lu

A Comparison of Word2Vec, HMM2Vec, and PCA2Vec for Malware Classification

Word embeddings are often used in natural language processing as a means to quantify relationships between words. More generally, these same word embedding techniques can be used to quantify relationships between features. In this paper, we…

Cryptography and Security · Computer Science 2021-03-11 Aniket Chandak , Wendy Lee , Mark Stamp

Syntree2Vec - An algorithm to augment syntactic hierarchy into word embeddings

Word embeddings aims to map sense of the words into a lower dimensional vector space in order to reason over them. Training embeddings on domain specific data helps express concepts more relevant to their use case but comes at a cost of…

Computation and Language · Computer Science 2018-08-20 Shubham Bhardwaj

Synonym Detection Using Syntactic Dependency And Neural Embeddings

Recent advances on the Vector Space Model have significantly improved some NLP applications such as neural machine translation and natural language generation. Although word co-occurrences in context have been widely used in…

Computation and Language · Computer Science 2022-10-03 Dongqiang Yang , Pikun Wang , Xiaodong Sun , Ning Li

Towards a Measure of Algorithm Similarity

Given two algorithms for the same problem, can we determine whether they are meaningfully different? In full generality, the question is uncomputable, and empirically it is muddied by competing notions of similarity. Yet, in many…

Machine Learning · Computer Science 2025-11-03 Shairoz Sohail , Taher Ali

SAFE: Self-Attentive Function Embeddings for Binary Similarity

The binary similarity problem consists in determining if two functions are similar by only considering their compiled form. Advanced techniques for binary similarity recently gained momentum as they can be applied in several fields, such as…

Cryptography and Security · Computer Science 2019-12-20 Luca Massarelli , Giuseppe Antonio Di Luna , Fabio Petroni , Leonardo Querzoni , Roberto Baldoni

Learning Joint Embeddings of Function and Process Call Graphs for Malware Detection

Software systems can be represented as graphs, capturing dependencies among functions and processes. An interesting aspect of software systems is that they can be represented as different types of graphs, depending on the extraction goals…

Machine Learning · Computer Science 2025-10-14 Kartikeya Aneja , Nagender Aneja , Murat Kantarcioglu

Unsupervised Features Extraction for Binary Similarity Using Graph Embedding Neural Networks

In this paper we consider the binary similarity problem that consists in determining if two binary functions are similar only considering their compiled form. This problem is know to be crucial in several application scenarios, such as…

Machine Learning · Computer Science 2018-11-14 Roberto Baldoni , Giuseppe Antonio Di Luna , Luca Massarelli , Fabio Petroni , Leonardo Querzoni

IdBench: Evaluating Semantic Representations of Identifier Names in Source Code

Identifier names convey useful information about the intended semantics of code. Name-based program analyses use this information, e.g., to detect bugs, to predict types, and to improve the readability of code. At the core of name-based…

Machine Learning · Computer Science 2021-01-15 Yaza Wainakh , Moiz Rauf , Michael Pradel

Process Mining Embeddings: Learning Vector Representations for Petri Nets

Process Mining offers a powerful framework for uncovering, analyzing, and optimizing real-world business processes. Petri nets provide a versatile means of modeling process behavior. However, traditional methods often struggle to…

Artificial Intelligence · Computer Science 2024-08-01 Juan G. Colonna , Ahmed A. Fares , Márcio Duarte , Ricardo Sousa

code2vec: Learning Distributed Representations of Code

We present a neural model for representing snippets of code as continuous distributed vectors ("code embeddings"). The main idea is to represent a code snippet as a single fixed-length $\textit{code vector}$, which can be used to predict…

Machine Learning · Computer Science 2018-10-31 Uri Alon , Meital Zilberstein , Omer Levy , Eran Yahav

Let's Simply Count: Quantifying Distributional Similarity Between Activities in Event Data

To obtain insights from event data, advanced process mining methods assess the similarity of activities to incorporate their semantic relations into the analysis. Here, distributional similarity that captures similarity from activity…

Databases · Computer Science 2025-09-12 Henrik Kirchmann , Stephan A. Fahrenkrog-Petersen , Xixi Lu , Matthias Weidlich

GraphCode2Vec: Generic Code Embedding via Lexical and Program Dependence Analyses

Code embedding is a keystone in the application of machine learning on several Software Engineering (SE) tasks. To effectively support a plethora of SE tasks, the embedding needs to capture program syntax and semantics in a way that is…

Software Engineering · Computer Science 2022-01-24 Wei Ma , Mengjie Zhao , Ezekiel Soremekun , Qiang Hu , Jie Zhang , Mike Papadakis , Maxime Cordy , Xiaofei Xie , Yves Le Traon

Utility of General and Specific Word Embeddings for Classifying Translational Stages of Research

Conventional text classification models make a bag-of-words assumption reducing text into word occurrence counts per document. Recent algorithms such as word2vec are capable of learning semantic meaning and similarity between words in an…

Computation and Language · Computer Science 2018-07-11 Vincent Major , Alisa Surkis , Yindalon Aphinyanaphongs

Statistical Depth Meets Machine Learning: Kernel Mean Embeddings and Depth in Functional Data Analysis

Statistical depth is the act of gauging how representative a point is compared to a reference probability measure. The depth allows introducing rankings and orderings to data living in multivariate, or function spaces. Though widely applied…

Statistics Theory · Mathematics 2021-05-28 George Wynne , Stanislav Nagy

How Does That Sound? Multi-Language SpokenName2Vec Algorithm Using Speech Generation and Deep Learning

Searching for information about a specific person is an online activity frequently performed by many users. In most cases, users are aided by queries containing a name and sending back to the web search engines for finding their will.…

Computation and Language · Computer Science 2020-07-23 Aviad Elyashar , Rami Puzis , Michael Fire

Force2Vec: Parallel force-directed graph embedding

A graph embedding algorithm embeds a graph into a low-dimensional space such that the embedding preserves the inherent properties of the graph. While graph embedding is fundamentally related to graph visualization, prior work did not…

Social and Information Networks · Computer Science 2020-09-22 Md. Khaledur Rahman , Majedul Haque Sujon , Ariful Azad

Identifier-Free Code Embedding Models for Scalable Search

Function association is a useful process for binary reverse engineers. Search tools exist to perform association at scale, but they do not utilize the full range of capabilities that AI-enabled search provides. Prior work has explored the…

Cryptography and Security · Computer Science 2026-05-08 Eric Wolos , Michael Doyle

Import2vec - Learning Embeddings for Software Libraries

We consider the problem of developing suitable learning representations (embeddings) for library packages that capture semantic similarity among libraries. Such representations are known to improve the performance of downstream learning…

Software Engineering · Computer Science 2019-04-09 Bart Theeten , Frederik Vandeputte , Tom Van Cutsem