Related papers: ast2vec: Utilizing Recursive Neural Encodings of P…

Process Mining Embeddings: Learning Vector Representations for Petri Nets

Process Mining offers a powerful framework for uncovering, analyzing, and optimizing real-world business processes. Petri nets provide a versatile means of modeling process behavior. However, traditional methods often struggle to…

Artificial Intelligence · Computer Science 2024-08-01 Juan G. Colonna , Ahmed A. Fares , Márcio Duarte , Ricardo Sousa

Learning Program Representations with a Tree-Structured Transformer

Learning vector representations for programs is a critical step in applying deep learning techniques for program understanding tasks. Various neural network models are proposed to learn from tree-structured program representations, e.g.,…

Software Engineering · Computer Science 2023-01-10 Wenhan Wang , Kechi Zhang , Ge Li , Shangqing Liu , Anran Li , Zhi Jin , Yang Liu

Prob2Vec: Mathematical Semantic Embedding for Problem Retrieval in Adaptive Tutoring

We propose a new application of embedding techniques for problem retrieval in adaptive tutoring. The objective is to retrieve problems whose mathematical concepts are similar. There are two challenges: First, like sentences, problems…

Computers and Society · Computer Science 2020-03-25 Du Su , Ali Yekkehkhany , Yi Lu , Wenmiao Lu

Abstract Syntax Tree for Programming Language Understanding and Representation: How Far Are We?

Programming language understanding and representation (a.k.a code representation learning) has always been a hot and challenging task in software engineering. It aims to apply deep learning techniques to produce numerical representations of…

Software Engineering · Computer Science 2023-12-04 Weisong Sun , Chunrong Fang , Yun Miao , Yudu You , Mengzhe Yuan , Yuchen Chen , Quanjun Zhang , An Guo , Xiang Chen , Yang Liu , Zhenyu Chen

Code Representation Learning with Pr\"ufer Sequences

An effective and efficient encoding of the source code of a computer program is critical to the success of sequence-to-sequence deep neural network models for tasks in computer program comprehension, such as automated code summarization and…

Artificial Intelligence · Computer Science 2021-11-16 Tenzin Jinpa , Yong Gao

phylo2vec: a library for vector-based phylogenetic tree manipulation

Phylogenetics is a fundamental component of evolutionary analysis frameworks in biology and linguistics. Recently, the advent of large-scale genomics and the SARS-CoV-2 pandemic has highlighted the necessity for phylogenetic software to…

Populations and Evolution · Quantitative Biology 2025-10-29 Neil Scheidwasser , Ayush Nag , Matthew J Penn , Anthony MV Jakob , Frederik Mølkjær Andersen , Mark P Khurana , Landung Setiawan , David A Duchêne , Samir Bhatt

Analysing the Behaviour of Tree-Based Neural Networks in Regression Tasks

The landscape of deep learning has vastly expanded the frontiers of source code analysis, particularly through the utilization of structural representations such as Abstract Syntax Trees (ASTs). While these methodologies have demonstrated…

Machine Learning · Computer Science 2024-06-18 Peter Samoaa , Mehrdad Farahani , Antonio Longa , Philipp Leitner , Morteza Haghir Chehreghani

code2vec: Learning Distributed Representations of Code

We present a neural model for representing snippets of code as continuous distributed vectors ("code embeddings"). The main idea is to represent a code snippet as a single fixed-length $\textit{code vector}$, which can be used to predict…

Machine Learning · Computer Science 2018-10-31 Uri Alon , Meital Zilberstein , Omer Levy , Eran Yahav

Automatic Repair and Type Binding of Undeclared Variables using Neural Networks

Deep learning had been used in program analysis for the prediction of hidden software defects using software defect datasets, security vulnerabilities using generative adversarial networks as well as identifying syntax errors by learning a…

Software Engineering · Computer Science 2019-07-16 Venkatesh Theru Mohan , Ali Jannesari

Dis-S2V: Discourse Informed Sen2Vec

Vector representation of sentences is important for many text processing tasks that involve clustering, classifying, or ranking sentences. Recently, distributed representation of sentences learned by neural models from unlabeled data has…

Computation and Language · Computer Science 2016-10-27 Tanay Kumar Saha , Shafiq Joty , Naeemul Hassan , Mohammad Al Hasan

Neural Code Comprehension: A Learnable Representation of Code Semantics

With the recent success of embeddings in natural language processing, research has been conducted into applying similar methods to code analysis. Most works attempt to process the code directly or use a syntactic tree representation,…

Machine Learning · Computer Science 2018-11-30 Tal Ben-Nun , Alice Shoshana Jakobovits , Torsten Hoefler

Domain2Vec: Vectorizing Datasets to Find the Optimal Data Mixture without Training

We introduce~\textsc{Domain2Vec}, a novel approach that decomposes any dataset into a linear combination of several \emph{meta-domains}, a new concept designed to capture the key underlying features of datasets. \textsc{Domain2Vec}…

Computation and Language · Computer Science 2025-06-13 Mozhi Zhang , Howe Tissue , Lu Wang , Xipeng Qiu

AST-Probe: Recovering abstract syntax trees from hidden representations of pre-trained language models

The objective of pre-trained language models is to learn contextual representations of textual data. Pre-trained language models have become mainstream in natural language processing and code modeling. Using probes, a technique to study the…

Computation and Language · Computer Science 2022-09-13 José Antonio Hernández López , Martin Weyssow , Jesús Sánchez Cuadrado , Houari Sahraoui

Attributed Network Embedding via Subspace Discovery

Network embedding aims to learn a latent, low-dimensional vector representations of network nodes, effective in supporting various network analytic tasks. While prior arts on network embedding focus primarily on preserving network topology…

Social and Information Networks · Computer Science 2019-05-21 Daokun Zhang , Jie Yin , Xingquan Zhu , Chengqi Zhang

pyRDF2Vec: A Python Implementation and Extension of RDF2Vec

This paper introduces pyRDF2Vec, a Python software package that reimplements the well-known RDF2Vec algorithm along with several of its extensions. By making the algorithm available in the most popular data science language, and by bundling…

Machine Learning · Computer Science 2022-05-06 Gilles Vandewiele , Bram Steenwinckel , Terencio Agozzino , Femke Ongenae

motif2vec: Motif Aware Node Representation Learning for Heterogeneous Networks

Recent years have witnessed a surge of interest in machine learning on graphs and networks with applications ranging from vehicular network design to IoT traffic management to social network recommendations. Supervised machine learning…

Social and Information Networks · Computer Science 2019-08-23 Manoj Reddy Dareddy , Mahashweta Das , Hao Yang

Phylo2Vec: a vector representation for binary trees

Binary phylogenetic trees inferred from biological data are central to understanding the shared history among evolutionary units. However, inferring the placement of latent nodes in a tree is computationally expensive. State-of-the-art…

Populations and Evolution · Quantitative Biology 2025-03-26 Matthew J Penn , Neil Scheidwasser , Mark P Khurana , David A Duchêne , Christl A Donnelly , Samir Bhatt

PSDVec: a Toolbox for Incremental and Scalable Word Embedding

PSDVec is a Python/Perl toolbox that learns word embeddings, i.e. the mapping of words in a natural language to continuous vectors which encode the semantic/syntactic regularities between the words. PSDVec implements a word embedding…

Computation and Language · Computer Science 2016-07-05 Shaohua Li , Jun Zhu , Chunyan Miao

Dynamic Neural Program Embedding for Program Repair

Neural program embeddings have shown much promise recently for a variety of program analysis tasks, including program synthesis, program repair, fault localization, etc. However, most existing program embeddings are based on syntactic…

Artificial Intelligence · Computer Science 2018-07-03 Ke Wang , Rishabh Singh , Zhendong Su

VUDENC: Vulnerability Detection with Deep Learning on a Natural Codebase for Python

Context: Identifying potential vulnerable code is important to improve the security of our software systems. However, the manual detection of software vulnerabilities requires expert knowledge and is time-consuming, and must be supported by…

Cryptography and Security · Computer Science 2022-01-24 Laura Wartschinski , Yannic Noller , Thomas Vogel , Timo Kehrer , Lars Grunske