Related papers: Neural Code Comprehension: A Learnable Representat…

code2vec: Learning Distributed Representations of Code

We present a neural model for representing snippets of code as continuous distributed vectors ("code embeddings"). The main idea is to represent a code snippet as a single fixed-length $\textit{code vector}$, which can be used to predict…

Machine Learning · Computer Science 2018-10-31 Uri Alon , Meital Zilberstein , Omer Levy , Eran Yahav

Learning Blended, Precise Semantic Program Embeddings

Learning neural program embeddings is key to utilizing deep neural networks in program languages research --- precise and efficient program representations enable the application of deep models to a wide range of program analysis tasks.…

Software Engineering · Computer Science 2019-07-12 Ke Wang , Zhendong Su

IR2Vec: LLVM IR based Scalable Program Embeddings

We propose IR2Vec, a Concise and Scalable encoding infrastructure to represent programs as a distributed embedding in continuous space. This distributed embedding is obtained by combining representation learning methods with flow…

Programming Languages · Computer Science 2020-12-25 S. VenkataKeerthy , Rohit Aggarwal , Shalini Jain , Maunendra Sankar Desarkar , Ramakrishna Upadrasta , Y. N. Srikant

Dynamic Neural Program Embedding for Program Repair

Neural program embeddings have shown much promise recently for a variety of program analysis tasks, including program synthesis, program repair, fault localization, etc. However, most existing program embeddings are based on syntactic…

Artificial Intelligence · Computer Science 2018-07-03 Ke Wang , Rishabh Singh , Zhendong Su

Net2Vec: Quantifying and Explaining how Concepts are Encoded by Filters in Deep Neural Networks

In an effort to understand the meaning of the intermediate representations captured by deep networks, recent papers have tried to associate specific semantic concepts to individual neural network filter responses, where interesting…

Computer Vision and Pattern Recognition · Computer Science 2018-03-30 Ruth Fong , Andrea Vedaldi

Towards Demystifying Dimensions of Source Code Embeddings

Source code representations are key in applying machine learning techniques for processing and analyzing programs. A popular approach in representing source code is neural source code embeddings that represents programs with…

Machine Learning · Computer Science 2022-06-17 Md Rafiqul Islam Rabin , Arjun Mukherjee , Omprakash Gnawali , Mohammad Amin Alipour

Code Vectors: Understanding Programs Through Embedded Abstracted Symbolic Traces

With the rise of machine learning, there is a great deal of interest in treating programs as data to be fed to learning algorithms. However, programs do not start off in a form that is immediately amenable to most off-the-shelf learning…

Software Engineering · Computer Science 2018-08-21 Jordan Henkel , Shuvendu K. Lahiri , Ben Liblit , Thomas Reps

A Literature Study of Embeddings on Source Code

Natural language processing has improved tremendously after the success of word embedding techniques such as word2vec. Recently, the same idea has been applied on source code with encouraging results. In this survey, we aim to collect and…

Machine Learning · Computer Science 2019-04-08 Zimin Chen , Martin Monperrus

Learning Semantic Representations for Novel Words: Leveraging Both Form and Context

Word embeddings are a key component of high-performing natural language processing (NLP) systems, but it remains a challenge to learn good representations for novel words on the fly, i.e., for words that did not occur in the training data.…

Computation and Language · Computer Science 2018-11-12 Timo Schick , Hinrich Schütze

Learning Program Semantics with Code Representations: An Empirical Study

Program semantics learning is the core and fundamental for various code intelligent tasks e.g., vulnerability detection, clone detection. A considerable amount of existing works propose diverse approaches to learn the program semantics for…

Software Engineering · Computer Science 2022-03-23 Jing Kai Siow , Shangqing Liu , Xiaofei Xie , Guozhu Meng , Yang Liu

A Survey On Neural Word Embeddings

Understanding human language has been a sub-challenge on the way of intelligent machines. The study of meaning in natural language processing (NLP) relies on the distributional hypothesis where language elements get meaning from the words…

Computation and Language · Computer Science 2021-10-06 Erhan Sezerer , Selma Tekir

Towards a Theoretical Understanding of Word and Relation Representation

Representing words by vectors, or embeddings, enables computational reasoning and is foundational to automating natural language tasks. For example, if word embeddings of similar words contain similar values, word similarity can be readily…

Computation and Language · Computer Science 2022-02-02 Carl Allen

Learning Deep Semantic Model for Code Search using CodeSearchNet Corpus

Semantic code search is the task of retrieving relevant code snippet given a natural language query. Different from typical information retrieval tasks, code search requires to bridge the semantic gap between the programming language and…

Computation and Language · Computer Science 2022-01-28 Chen Wu , Ming Yan

sense2vec - A Fast and Accurate Method for Word Sense Disambiguation In Neural Word Embeddings

Neural word representations have proven useful in Natural Language Processing (NLP) tasks due to their ability to efficiently model complex semantic and syntactic word relationships. However, most techniques model only one representation…

Computation and Language · Computer Science 2015-11-23 Andrew Trask , Phil Michalak , John Liu

Hierarchical Learning of Cross-Language Mappings through Distributed Vector Representations for Code

Translating a program written in one programming language to another can be useful for software development tasks that need functionality implementations in different languages. Although past studies have considered this problem, they may…

Machine Learning · Computer Science 2018-03-14 Nghi D. Q. Bui , Lingxiao Jiang

Novel Word Embedding and Translation-based Language Modeling for Extractive Speech Summarization

Word embedding methods revolve around learning continuous distributed vector representations of words with neural networks, which can capture semantic and/or syntactic cues, and in turn be used to induce similarity measures among words,…

Computation and Language · Computer Science 2016-07-25 Kuan-Yu Chen , Shih-Hung Liu , Berlin Chen , Hsin-Min Wang , Hsin-Hsi Chen

graph2vec: Learning Distributed Representations of Graphs

Recent works on representation learning for graph structured data predominantly focus on learning distributed representations of graph substructures such as nodes and subgraphs. However, many graph analytics tasks such as graph…

Artificial Intelligence · Computer Science 2017-07-18 Annamalai Narayanan , Mahinthan Chandramohan , Rajasekar Venkatesan , Lihui Chen , Yang Liu , Shantanu Jaiswal

Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech

In this paper, we propose a novel deep neural network architecture, Speech2Vec, for learning fixed-length vector representations of audio segments excised from a speech corpus, where the vectors contain semantic information pertaining to…

Computation and Language · Computer Science 2018-06-12 Yu-An Chung , James Glass

On the Embeddings of Variables in Recurrent Neural Networks for Source Code

Source code processing heavily relies on the methods widely used in natural language processing (NLP), but involves specifics that need to be taken into account to achieve higher quality. An example of this specificity is that the semantics…

Software Engineering · Computer Science 2021-04-28 Nadezhda Chirkova

Neural Embeddings for Text

We propose a new kind of embedding for natural language text that deeply represents semantic meaning. Standard text embeddings use the outputs from hidden layers of a pretrained language model. In our method, we let a language model learn…

Computation and Language · Computer Science 2022-11-22 Oleg Vasilyev , John Bohannon