Related papers: Open Vocabulary Learning on Source Code with a Gra…

Maybe Deep Neural Networks are the Best Choice for Modeling Source Code

Statistical language modeling techniques have successfully been applied to source code, yielding a variety of new software development tools, such as tools for code suggestion and improving readability. A major issue with these techniques…

Software Engineering · Computer Science 2019-03-15 Rafael-Michael Karampatsis , Charles Sutton

Big Code != Big Vocabulary: Open-Vocabulary Models for Source Code

Statistical language modeling techniques have successfully been applied to large source code corpora, yielding a variety of new software development tools, such as tools for code suggestion, improving readability, and API migration. A major…

Software Engineering · Computer Science 2020-03-19 Rafael-Michael Karampatsis , Hlib Babii , Romain Robbes , Charles Sutton , Andrea Janes

Generative Code Modeling with Graphs

Generative models for source code are an interesting structured prediction problem, requiring to reason about both hard syntactic and semantic constraints as well as about natural, likely programs. We present a novel model for this problem…

Machine Learning · Computer Science 2019-04-18 Marc Brockschmidt , Miltiadis Allamanis , Alexander L. Gaunt , Oleksandr Polozov

Graph-to-Sequence Learning using Gated Graph Neural Networks

Many NLP applications can be framed as a graph-to-sequence learning problem. Previous work proposing neural architectures on this setting obtained promising results compared to grammar-based approaches but still rely on linearisation…

Computation and Language · Computer Science 2018-06-27 Daniel Beck , Gholamreza Haffari , Trevor Cohn

Learning to Represent Programs with Graphs

Learning tasks on source code (i.e., formal languages) have been considered recently, but most work has tried to transfer natural language methods and does not capitalize on the unique opportunities offered by code's known syntax. For…

Machine Learning · Computer Science 2018-05-08 Miltiadis Allamanis , Marc Brockschmidt , Mahmoud Khademi

On the Embeddings of Variables in Recurrent Neural Networks for Source Code

Source code processing heavily relies on the methods widely used in natural language processing (NLP), but involves specifics that need to be taken into account to achieve higher quality. An example of this specificity is that the semantics…

Software Engineering · Computer Science 2021-04-28 Nadezhda Chirkova

CodeKGC: Code Language Model for Generative Knowledge Graph Construction

Current generative knowledge graph construction approaches usually fail to capture structural knowledge by simply flattening natural language into serialized texts or a specification language. However, large generative language model…

Computation and Language · Computer Science 2024-01-19 Zhen Bi , Jing Chen , Yinuo Jiang , Feiyu Xiong , Wei Guo , Huajun Chen , Ningyu Zhang

CodeGRU: Context-aware Deep Learning with Gated Recurrent Unit for Source Code Modeling

Recently deep learning based Natural Language Processing (NLP) models have shown great potential in the modeling of source code. However, a major limitation of these approaches is that they take source code as simple tokens of text and…

Neural and Evolutionary Computing · Computer Science 2020-07-15 Yasir Hussain , Zhiqiu Huang , Yu Zhou , Senzhang Wang

Scalable Micro-planned Generation of Discourse from Structured Data

We present a framework for generating natural language description from structured data such as tables; the problem comes under the category of data-to-text natural language generation (NLG). Modern data-to-text NLG systems typically employ…

Computation and Language · Computer Science 2019-10-08 Anirban Laha , Parag Jain , Abhijit Mishra , Karthik Sankaranarayanan

Sparse Structure Learning via Graph Neural Networks for Inductive Document Classification

Recently, graph neural networks (GNNs) have been widely used for document classification. However, most existing methods are based on static word co-occurrence graphs without sentence-level information, which poses three challenges:(1) word…

Computation and Language · Computer Science 2022-03-22 Yinhua Piao , Sangseon Lee , Dohoon Lee , Sun Kim

Gated Graph Sequence Neural Networks

Graph-structured data appears frequently in domains including chemistry, natural language semantics, social networks, and knowledge bases. In this work, we study feature learning techniques for graph-structured inputs. Our starting point is…

Machine Learning · Computer Science 2017-09-26 Yujia Li , Daniel Tarlow , Marc Brockschmidt , Richard Zemel

Unsupervised Construction of Knowledge Graphs From Text and Code

The scientific literature is a rich source of information for data mining with conceptual knowledge graphs; the open science movement has enriched this literature with complementary source code that implements scientific models. To exploit…

Machine Learning · Computer Science 2019-08-27 Kun Cao , James Fairbanks

GALLa: Graph Aligned Large Language Models for Improved Source Code Understanding

Programming languages possess rich semantic information - such as data flow - that is represented by graphs and not available from the surface form of source code. Recent code language models have scaled to billions of parameters, but model…

Computation and Language · Computer Science 2025-09-24 Ziyin Zhang , Hang Yu , Shijie Li , Peng Di , Jianguo Li , Rui Wang

Modeling Order in Neural Word Embeddings at Scale

Natural Language Processing (NLP) systems commonly leverage bag-of-words co-occurrence techniques to capture semantic and syntactic word relationships. The resulting word-level distributed representations often ignore morphological…

Computation and Language · Computer Science 2015-06-12 Andrew Trask , David Gilmore , Matthew Russell

Graph Neural Networks for Natural Language Processing: A Survey

Deep learning has become the dominant approach in coping with various tasks in Natural LanguageProcessing (NLP). Although text inputs are typically represented as a sequence of tokens, there isa rich variety of NLP problems that can be best…

Computation and Language · Computer Science 2022-10-21 Lingfei Wu , Yu Chen , Kai Shen , Xiaojie Guo , Hanning Gao , Shucheng Li , Jian Pei , Bo Long

A Cross-Architecture Instruction Embedding Model for Natural Language Processing-Inspired Binary Code Analysis

Given a closed-source program, such as most of proprietary software and viruses, binary code analysis is indispensable for many tasks, such as code plagiarism detection and malware analysis. Today, source code is very often compiled for…

Cryptography and Security · Computer Science 2018-12-27 Kimberly Redmond , Lannan Luo , Qiang Zeng

From Text to Graph: Leveraging Graph Neural Networks for Enhanced Explainability in NLP

Researchers have relegated natural language processing tasks to Transformer-type models, particularly generative models, because these models exhibit high versatility when performing generation and classification tasks. As the size of these…

Computation and Language · Computer Science 2025-04-04 Fabio Yáñez-Romero , Andrés Montoyo , Armando Suárez , Yoan Gutiérrez , Ruslan Mitkov

Neural Sketch Learning for Conditional Program Generation

We study the problem of generating source code in a strongly typed, Java-like programming language, given a label (for example a set of API calls or types) carrying a small amount of information about the code that is desired. The generated…

Programming Languages · Computer Science 2018-04-16 Vijayaraghavan Murali , Letao Qi , Swarat Chaudhuri , Chris Jermaine

A Neural Model for Generating Natural Language Summaries of Program Subroutines

Source code summarization -- creating natural language descriptions of source code behavior -- is a rapidly-growing research topic with applications to automatic documentation generation, program comprehension, and software maintenance.…

Software Engineering · Computer Science 2019-02-07 Alexander LeClair , Siyuan Jiang , Collin McMillan

Improved Code Summarization via a Graph Neural Network

Automatic source code summarization is the task of generating natural language descriptions for source code. Automatic code summarization is a rapidly expanding research area, especially as the community has taken greater advantage of…

Software Engineering · Computer Science 2020-04-08 Alexander LeClair , Sakib Haque , Lingfei Wu , Collin McMillan